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0 1 AUG. 2002 
Modtaget. 



Methods of mixing large numbers of heterologous genes. 

All patent and non-patent references cited In the application, or In the present 
application, are also hereby incorporated by reference in their entirety. 

5 

Field of invention 

The present invention relates to methods of mixing large numbers of heterologous 
genes, which are located on artificial chromsomes. The methods of the present 
0 invention are useful for evolution of cells and whole genomes to acquire new 
functionality(ies). such as the ability to synthesise novel secondary metabolites 
and/or the evolution of novel metabonc pathways. 

Baclcground of invention 



Recombination of cells in order to optimise or produce heterologous proteins is a 
well-established practice in molecular biology. 

The traditional approach to engineered molecular evolution relates to optimisation of 
an individual gene having a specific phenotype. The strategy Is to clone a gene, 
identify a function for the gene and an assay for selecting the gene, mutate selected 
positions in the gene and select variants of the gene for improvement in the known 
function of the gene, A variant having a desired function may tiien be expressed in a 
suitable host cell. 



However, the traditional approach has several drawbacks when it comes to 
evolution of cells having new properties, since Oie approach only relates to discrete 
genes. Multiple genes that cooperatively confer a single phenotype cannot be 
optimised in ttiis manner. Furthermore, Vne traditional approach only leads to a very 
30 limited number of combinations or penmutafions in or cell or even for a single gene. 

Evolution of cells having new properties have been described in for example WO 
98/31837 wherein a method of evolving cells towards acquisition of new properties 
employing iterative cycles of recombination and selection/screening for evolution is 
35 discussed. 
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In WO 97/35966 a process of recursive sequence recombination in order to evolve 
new metabolic pathways are discussed, and in WO 00/04190 a process of recursive 
sequence recombination in order to evolve whole cells and oi^anisms having 
5 desired properties. 

Whether using the traditional approach of optimising individual genes or conducting 
iterative cycles of recombination, the Individual genes in the cells In question are 
recombined, Le. changed with foreign genetic material evolving new genes. 

10 

A major drawback when evolving new genes in this manner is, that each cycle of 
recombination may as well result in a failure leading to a nonsense gene as a 
success leading to an optimised gene. Furlhennore, these methods are not very 
useful for evolufion of novel metabolic pathways, since very few gene combination 
15 are produced. 

WO 98/34112 discloses a combinatorial gene expression library with a pool of 
expression constructs each construct containing a cDNA or genomic DMA fragment 
from a plurality of donor organisms with the purpose of generating new metabolic 

20 pathways. The publication also discloses a combinatorial gene expression library in 
which each cell comprises a concatemer of cDNA fragments being operably 
assodated with regulatory regions to drive expression of the genes encoded by the 
concatenated cDNA in a host organism. Once tlie gene cassettes have been cloned 
into the vector, it is not possible to excise the cassettes or the complete concatemer 

25 from the vector using a restriction enzyme. The reference is silent on the possibility 
of changing and optimising the combination of genes in each cell. 

Consequently there is a need for developing methods for generation and 
optimisation of novel metabolic pathways 

30 

Summary of invention 

A first aspect of the invention relates to a method of mixing heterologous genes in 
expression cassettes bcated on artificial chromosomes, said method comprising the 
35 steps of: 
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providing two initial populations of cells that can mate vyrith each other, 
said initial populations comprising at least 2 cells in each population, and at least 
twvo cells in each population having different combinations of heterologous genes 
and/or different combinations of expression cassettes. 
5 each cell comprising at least a first type of artificiai chromosome, the at least first 
type of artificial chromosome comprising both at least two expression cassettes 
comprising heterologous genes and at least one selecteible marker, 
the selectable martcers being allocated to artificial chromosomes so that each type 
of artificial chromosome from each population can be indi^ddually selected for, 
10 mating the cells with each other, and 

selecting mated cells that carry at least a subset of the selectable markers, present 
on the artificial chromosomes in the two initial populations. 

By "a type of artifidal chromosome" Is meant a group of artificial chromosomes 
15 sharing the same selectable markers. According to the present invention such a 
group of artificial chromosomes comprise artificial chromosomes with different 
expression cassettes or with different combinations of expression cassettes. 
IVIethods for generation of such artificial chromosomes and the differences between 
artificial chromosomes within ttie same type are disclosed in the detailed description 
'20 part of the invention. 

The method provides a possibility for changing ttie combination of genes located on 
artificial chromosomes in a cell by using simple means as mating and subsequent 
meiosis. Thus the methods of the present invention provides solutions to the 
25 problem of obtaining furttier mixing of genes that have already been mixed as they 
were selected for insertion into tiie two initial populations. 

The presence of the artificial chromosomes in the mated cells is ensured by the 
used of selectable martcers located on all the artificial chromosomes, tiiat select for 
30 mated cells After mating any sub-set of types of artificial chromosomes that select 
for mated cells can be used. One example of this is selecting for maricer 
combinations present on at least one type of artificial chromosome present in each 
of the initial populations. 
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However In order to conserve the majority of genes while keeping the number of 
selective media/conditions reasonably low, the subset of the selectable markers 
selected for should include selecton for at lea.st 70 % of all diploid types present in 
the mated population. More preferably the subset of the selectable markers selected 
5 for should indude selection for at least 80 % of all diplokJ typ^ present in the mated 
population, such as at least 90%. for example at least 95%. such as at least 99%, 
for example 100%. 

The process may be continued by allowing the mated cells to subsequently undergo 
10 meiosis. Preferably meiosis is performed under conditions where cells without 
artificial chromosomes and cells that have not undergone rheiosis do not survive. 
Standard protocols are available to do this e.g. for yeast cells. 

Once cells have undergone meiosis and until the next mating round, the cells are 
15 kept under conditions where cells without artificial chromosomes do not survive. As 
the mixing method results in the generation of novel gene combinations It may 
advantageously be combined with screening of mated cells for a parameter related 
to a desired functionality(ies) and selecting cells having a predefined selection 
criterion(a) to undergo meiosis and mating. By screening cells at the diploid level, 
20 more genes are present in each cell, and consequently the chance of generating 
novel metabolic pathways is higher than at the haploid level. 

Screening may altematively comprise screening cells that have undergone meiosis 
for a parameter related to a desired functionarrty(ies) and selecting cells having a 
25 predefined selection criterion(a) to undergo mating and meiosis. 

The mixing, mating, selection and further meiosis steps may be repeated to 
stepwise optimise the gene combinations or evolve novel gene combinations by 
repeating the process at least twice, such as 3 times, for example 4 times, such as 5 

30 times, for example 6 times, such as 7 times, for example 8 times, such as 9 times, 
for example 10 times, such as 11 times, for example 12 times, such as 13 times, for 
example .14 times, such as 15 times, for example 16 times, such as 17 times, for 
example 18 times, such as 19 times, for example 20 times, such as 25 times, for 
example at least 30 times, such as at least 40 times, for example at least 50 times, 

35 such as at least 75 times, for example at least 1 00 times, such as at least 200 times, 
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for example at least 300 times, such as at least 500 times, for example at least 1000 
times. 



In order to reduce the accumulation of mutations in the host cells the method may 
5 further comprise subjecting the populations of cells to physical isolation of artifidal 
chromosomes from the populations for every 4-5 rounds of metosis and selection, 
and transfem'ng the isolated artificial chromosomes Into new host cells. The physical 
isolation may e.g. comprise amplification of artificial chromosomes in tiie host cells 
so ttiat tiie artifidal chromosomes constitute up to 20% of total DMA in the cells. 
10 Alternatively the physical isolation may comprise cutting expression cassettes from 
concatamers of expression cassettes on artificial chromosomes, cloning and 
amplification of these and re-assembling expression cassettes Into an artifidal 
chromosome vector backbone and transforming ttiese Into new host cells. 

15 After meiosis, cells of the mating types may be separated from each ottier. 

When woricing witti a spore fomiing host species, the method may comprise mixing 
spores from different populations prior to mating. The two populations can then be 
mixed very efficientiy t>efore mating and thus there will be less" mating within each 
20 population. 



As there is no guarantee tiiat tiie mixing method according to the invention results in 
improved genotypes compared to the Initial populations or to eariy rounds of mixing 
and selection, ttie method preferably comprises storing a sub-population of mated 
and selected cells, while anoOier sub-population undergoes further meiosis and 
mating. According to this embodiment, ttie mettiod preferably additionally comprises 
screening of at least a stored sub-population togetiier with a population that has 
undergone at least one further round of meiosis and mating at a higher selection 
threshold than the previous screening, selecting cells above the higher selection 
threshold, and mating the selected cells with each ottier. The metiiod may also 
comprise screening together with the stored sub-population and/or a population tiiat 
has undergone at least one further round of meiosis, mating and screening, cells 
that contain expression cassettes or combinations of cassettes that have not been 
screened before. 
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It is also possible to add a ftjrther population of cells with artificial chromosomes 
comprising at least two expression cassettes with heterologous genes, the cells 
being capable of mating with the cells that have undergone mating and meiosis, the 
further population comprising at least 2 cells with combinations of expression 
5 cassettes different from the combinations in the cells of the initial population, the 
artifidal chromosomes of said further population carrying at least one selectable 
marker. Preferably the artificial chromosomes of said further population have the 
same markers as the initial populations. The further population may comprise a 
50/50 mixture of cells of the two mating types of the initial populations or it may 
1 0 comprise cells of one of the mating types of the initial populations. 

At least one of the two initial populations of cells that can mate with each other may 
further carry at least a second type of artificial chromosome with expression 
cassettes comprising heterologous genes, the first and second types of artificial 

15 chromosome carrying at least one selectable marker so that said first and second 
types of artificial chromosome can be Individually selected for. More preferably, at 
least one of the two initial populations of cells that can mate with each other further 
carries at least a third type of artificial chromosome with expression cassettes 
comprising heterologous genes, the first, second, and third types of artificial 

20 chromosome canying at least one selectable marker so that said first, second, and 
third type of artificial chromosome can be individually selected for. More preferably 
at least one of the two Initial populations of cells that can mate with each other 
further carries at least a fourth type of artificial chromosome with expression 
cassettes comprising hetenDlogous genes, the first, second, third, and fourth type of 

25 artificial chromosome canving at least one selectable marker so that said first, 
second, third, and fourth type of artificial chromosome can be individually selected 
for. 



More generally speaking the two initial populations of cells that can mate vwth each 
other may carry from 1 to 10 types of artificial chromosomes, each artificial 
chromosome of each population carrying at least one selectable maricer so ttiat 
each of the types of artificial chromosomes from each of the two populations can be 
individually selected for. 
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Similarly, the further population of cells vwth artificial chromosomes capable of 
mating with the cells that have undergone mating and meiosis may carry from 1 to 
10 types of artifidal chromosomes, each type of artificial chromosome of said further 
population carrying at least one selectable marker so that each of the fypes of 
• 5 artificial chromosomes can be individually selected for. 

According to one embodiment each cell may carry 2 artificial chromosomes per cell 
that can mate. According to another embodiment each cell may cany 3 artificial 
chromosomes per cell that can mate. The number of artificial choDmosomes per cell 

10 is today considered a practical number, at least in the case where yeast is the host 
species. This is because this number of artificial chromosomes can be efficiently 
transferred into and be stably maintained at least in yeast and also because too high 
a number of artificial chromosomes may cause centromere toxicity, at least when all 
the centromeres of the artificial chromosomes are identical. It is expected that within 

15 the term of the present patent, methods will be developed for stable maintenance of 
a higher number of artificial chromosomes in yeast and possibly also in other 
species. 



Preferably each artificial chromosome canies at least two selectable markers, the 
selectable markers being allocated to types of artificial chromosomes so that each 
type of artificial chromosome from each population can be individually selected for. 
In this case, one marker is nonnally located on each arm of Uie artificial 
chromosomes to ensure ttiat the artifidal chromosomes do not lose any of the amis. 
Artifidal chromosomes have a tendency to lose one of the arms. 

Preferably, all artifidal chromosomes carry a common marker so that It is possible at 
any point to select for cells that contain at least one artificial chromosome. Suitable 
selectable markers are selected from drug resistance, colour, morphology, 
resistance against electromagnetic radiation, salt tolerance. O2 resistance, markers 
based on fluorescence probes, and auxotrophy markers, markers that can be used 
to produce high copy numbers (e.g. a poorly expressed LEU2 gene (leu-2d). 
heterologous ttiymidine kinase (TK), heterologous dihydrofolate reductase gene), 
heterologous genes Uiat give a growOi advantage. More preferably tiie markers are 
auxotrophy markers. 
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Spea'fic examples of these markers incfude but are not limited to: NPT", LEU 2, 
TRP I. HIS 3. LYS 2. URA 3. ADE 2, Amytoglucosidase, ^-lactamase. CUP 1. 
0418^ TUN«. KILkl. C230. SMR1. SFA. Hygromycin'*. methotrexate**, 
chloramphenicol", Dluron"*, Zeocin^ Canavanlne**. ARG 4. THR. Ludferase, GUS, 
5 GFP.LUX. 

When possible, it is prefenred that the two initial populations are of different mating 
types. The size of the Initial populations may either be selected to that the tvwo initial 
populations have approximately the same number of cells. AltemaHvely. the number 
1 0 off cells in one population is higher than the number of cells In the other population. 

In a further aspect the invention relates to a method of mfocing heterologous genes in 
expression cassettes located on artificial chromosomes, said method comprising the 
steps of 

1 5 providing two initial populations of protoplasts or cells that can be fused, 

said initial populations comprising at least 2 cells in each population, and at least 
two cells in each population having different combinations of heterologous genes 
and/or different combinations of expression cassettes, 

each cell comprising at least a first type of artificial chromosome, said at least first 
20 type of artificial chromosome comprising both at least two expression cassettes 
comprising heterologous genes and at least one selectable mariner, 
the selectable maricers being allocated to artlfidal chromosomes so that each type 
cS artificial chromosome from each population can be individually selected for, 
perfonning protoplast fusion and regeneration of cell walls or performing fusion of 
25 cells, and 

selecting fused cells that carry at least a subs^ of flie selectable markers present 
on tiie artificial chromosomes in ttie two initial populations. 

As for the first aspect of this invention, the second aspect provides a method for 
30 mixing and optimising gene combinations for expressible genes tocated on artificial 
chromosomes. The mettiod may be used even in cases where it is not possible to 
mate cells in vitro. 




Preferably, the cells or protoplasts caused to fuse are haploid, so that the 
chromosome number of the fused cells is diploid. It is also possible in some species 
to fuse cells with higher ploidy level and thus obtain polyploid fused cells. 

5 . The process may be repeated one or more times. 

Preferably the species of cells are selected from fungi, algae, and plants, for which 
protocols for isolation and fusion of protoplasts and subsequent regeneration of cell 
walls are known. More preferably, the species of cells is a fungus species, in which 
10 it is possible to induce spore fomiation in vitro. Other relevant species include 
prokaryots, which can be fused. 

However, it is also contemplated, that it becomes possible to fuse animal cells, 
which do not have a cell wall, so that the species of cells may include animal cells, 
15 including human cells. 



Preferably, the species of cells is one for whldi extensive in vitro protocols are 
known, and for which standard molecular biology methods have been developed. 
These include industrial microorganisms, preferably fungi, more preferably yeast. 
20 Suitable examples of yeast species are disclosed in the detailed description of the 
present invention. Other species include carrot, Arabidopsis thaliana, NIcotiana spp.. 
Nicotiana tabacum, maize, wheat, rice, soybean, tomato, peanut potato, sugar 
beets, sunflower, yam, rape seed, a>nifers, and petunia. The list is expected to grow 
in future as the field expands. 

25 

As for the mating based method, the mixing may advantageously be combined with 
screening cells that result from protoplast fusion for a desired functionalty(les) and 
selecting cells having the desired functionalty(ies) above a defined threshold, 
Isolating protoplasts from these cells and perfonming protoplast fusion and cell 
30 regeneration on the selected cells. 

According to a further aspect the invention relates to a method for mixing 
heterologous genes in expression cassettes located on artificial chromosomes, said 
method comprising the steps of * 
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providing two initial populations of cells, 

said initial populations comprising at least 2 cells In each population, and at least 
two cells in each population having different combinations of heterologous genes 
and/or different combinations of expression cassettes. 
5 each cell comprising at least a first type of artifidal chromosome, the at least first 
type of artificial chromosome comprising both at least two expression cassettes 
comprising heterologous genes and at least one selectable maricer, 
the selectable maricers being allocated to artificial chromosomes so that each type 
of artificial chromosome from each population can be individually selected for, 
10 mating the cells with each other, 

amplifying the artificial chromosomes in the host cells, 
isolating the artifidal chromosomes, 
mixing the isolated artificial chromosomes, 

transfening subsets of said isolated and mixed artifidal chromosomes Into host 
15 cells, and 

selecting cells that carry at least a subset of the selectable markers present on the 
artifidal chromosomes in the two initial populations. 

According to this aspect, use is made of the fact that artificial chromosomes can be 
20 amplified in ttie host cells, for example in yeast it Is possible to use vectors that 
permitte copy number amplification. The vectors include a conditional centromere 
that can be tumed on or off. The disruption of the centromere activity by high levels 
of transcription towards conserved centromeric elemente leads to a segregation bias 
during cell division wherein the motiier cell receives both copies of the artificial 
25 chromsome, in this case a YAC. At the same time, a strong selective pressure for 
extra copies of the artifidal diromosome can be applied by selecting for the 
expression of a heterologous gene such as ttiymidine kinase. Selection for the TK 
gene can be accomplished by adding exogenous thymidine in the presence of 
methotrexate and sulfanilamide. The later two compounds Inhibit enzymes involved 
30 in the recyding or de novo syntiiesis of folate cofactos required for ttie synthesis of 
deoxyth'ymidilic add. When tiiis system is used, artificial chromosomes were readily 
amplified 10- to 20- fold. Reactivation of the centromere in amplified artificial 
chromosome clones resulted in stable maintenance of an elevated copy number 
(Smitti et al, PNAS, 1990, vol 87:8242-46). 
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The method may advantageously be combined vwth the other gene mixing methods 
according to the invention. In particular with the mating based method, which in a 
. preferred embodiment includes physical Isolation of artificial chromosomes. 

5 The host cells into which the subsets of mixed artificial chromosomes are 
transferred may already contain artifldai chromosomes with expression cassettes 
with heterologous genes. 

According to a further aspect of the invention there is provided a method of mixing 
10 heterologous genes in expression cassettes located on artificial chrom<^omes, said 
method comprising the steps of: 

a) obtaining at least one population of cells, the cells of said at least 
one population comprising a concatemer of expression cassettes of 
the following formula: 
15 [rs2-SP-PR.X-TR-SP.rsi]„ 
wherein 

rsi and rs2 together denote a restriction site. 
SP individually denotes a spacer, 

PR denotes a promoter, capable of functioning in the cells, 
20 X denotes an expressible nucleotide sequence, 

TR denotes a terminator, and 
n^2, 

the ceils differing from each other with respect to combinations of 
expressible nucleotide sequences andtor promoters and/or terminators 
25 and/or spacers. 



b) isolating at least some of the cassettes of the selected cells by 
cutHng the concatemers with a restriction enzyme cleaving rsirs2, 

c) amplifying at least some of the isolated cassettes, 

d) assembling the expression cassettes of step c) into artifidal 
chromosomes, and 

e) optionally transferring the artificial chromosomes into host cells. 

The concatemers can be used to make novel and non-native combinations of genes 
for co-ordinated expression in a host cell. Thereby new metabolic pathways can be 




generated, which may lead to the production of new metabolites, and/or to the 
metabolisation of compounds, which are oflierwise not metabolisable by the host 
cells. The new gene combinations may also lead to metabolic pathways which 
produce metabolites in new quantities or in new compartments of the cell or outside 
5 the cell. Depending on the purpose, the selection of genes can be made completely 
random based on sourdng of expressible nucleotide sequences across the different 
Idngdoms. However, it may also be advantageous to source genes from sourx:es 
known to have certain metabolic pathways in order to make taigeted new gene 
combinations. It may also be advantageous to source genes from organisms/tissues 
10 known to have relevant properties. e.g. a specific phamiaceutical activity. 

One of several advantages of the concatemers of the present invention is that the 
expression cassettes can be cut out from the concatemers at any point to make new 
combinations of expression cassettes. During re-assembly, further genes comprised 
15 in similar expression cassettes may be added if desired to modify the expression 
pattern. In this way, the concatemers according to the present invention present a 
powerful tool in generating novel gene combinations. 

One advantage of the stmcture of the concatemer is that cassettes can be 
20 recovered from the host cell through nucleotide isolation and subsequent digestion 
with a restriction enzyme specific for the rsi-rs2 restriction site. The building blocks 
of the concatemers may thus be disassembled and reassembled at any point 

The amplification step ensures that the copy number of the cassettes is high enough 
25 to be able to perform the re-assembly conveniently. The amplification step may 
include PCR with primers that tag rsi and rs2 and/or inserting isolated cassettes into 
a vector having a cloning site compatible with rsirs2 and multiplying this vector in a 
suitable host 

30 Further expression cassettes may be added for the assembly step if desired. 
Conveniently, the method may be combined with screening cells with assembled 
artificiai chromosomes for a desired functionalty(ies) and selecting cells having the 
desired functlonalty(ies). The process may be repeated by subjecting the selected 
cells to further isolation and amplification of cassettes and assembly of artificial 




chromosomes. The method may also be combined with any of the other gene 
mixing methods according to the invention: 

In a still further aspect the invention relates to a method for mixing expressible 
5 nucleotide sequences, said method comprising the steps of 



a) obtaining at least one population of cells, the cells of said at least one population 
comprising at least one expression cassettes of tiie following fonnula: 

[rsz-SP-PR-rsr-'X-rsa'-TR-SP-rsiln 
10 wherein 

rsi and rs2 togethv denote a restriction site. 

rsl' and rs2' together denote a different restriction site, 

SP individually denotes an optional spacer, 

PR denotes a promoter, capable of functioning In the cells, 
15 X denotes an expressible nucleotide sequence, 

TR denotes a terminator, and 

n 

b) isolating at least some of tiie expressible nucleotide sequences of the selected 
cells by cutting the cassettes with a restriction enzyme cleaving rs1'rs2', or by 

20 amplifying the sequences with primer pairs templating sequences in rsV and 

rs2\ 

c) re-Inserting the expressible nucleotide sequences into other similar backbone, 

d) re-mixing the expression cassettes, and 

e) transferring the re-expression cassettes into host cells. 

25 

The mettiod provides a way of isolating ttie coding sequence (including any axons) 
for re-insertion Into new expression constructs in order to increase the number of 
expression contexts. 

30 Preferably, the isolated expressible nucleotide sequences are inserted Into primary 
vectors comprising a nucleotide sequence cassette of the general fomnula in 5*-^3* 
direction: 

[RSI -RS2-SP-PR-CS-TR-SP-RS2'-RS1 1 
wherein 
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RSI and RSr denote restriction sites, 

RS2 and RS2* denotes restriction sites different from RS1 and RSI', 
SP individually denotes a spacer sequence of at least two nucleotides, 
PR denotes a promoter, 
5 CS denotes a cloning site, 

TR denotes a tenminator. 

Expression cassettes can be isolated from these primary vectors and assembled 
into concatamers of expression cassettes with new gene combinations. 

10 

In a further aspect the invention relates to a method of mixing heterologous genes in 
expression cassettes located on plasmids said method comprising the steps of 
providing two initial populations of cells that can mate with each other, 
said initial populations comprising at least 2 cells in each population, and at least 
15 two cells in each population having different combinations of heterologous genes 

and/or different combinations of expression cassettes, 

each cell comprising at least a first plasmid, the at least first plasmid comprising 
botti at least two expression cassettes comprising heterologous genes and at 
least one selectable martcer. 
20 the selectable maricers being allocated to plasmids so that each type of plasmid 

from each population can be individually selected for, 
mating the cells witii each other, and 

selecting mated cells that carry at least a subset of ttie selectable markers 
present on the plasmids in the two initial populations. 

25 

There are several advantages associated with having the expression cassettes 
located on a plasmid. Among these is the possibility of using a shuttie vector which 
makes it possible to amplify the plasmids in bacteria and later transfonm tiiem Into 
another cellular host, such as yeasL Plasmids may also be purified in bacteria by 

30 simply isolating total DNA from ttie host cells and transfomi tfiis into bacteria. As 
only the plasmids have an origin of replication which is functional in a bacterium, 
only the plasmids are replicated, and tiie plasmids are therefore selectably 
amplified. The selectable mariners located on the plasmids are used to select for 
bacteria harbouring the plasmids. The plasmids can then be re-isolated from the 

35 bacteria and re-inserted into the other host cells. 




Preferably, in the plasmid based methcxl, the expression cassettes are located on a 
nucleotide concatemer comprising in the 5'->3' direction a cassette of nucleotide 
sequence of the general formula 

5 

[rs2-SP-PR-X-TR^P-rsil„ 
wherein 

0 rsi and rs2 together denote a functional restriction site, 

SP individually denotes a spacer of at least two nucleotide bases, 
PR denotes a promoter, capable of functioning in a cell. 
X denotes an expressible nucleotide sequence, 
TR denotes a terminator, and 

5 SP individually denotes a spacer of at least two nucleotide bases, and 

n > 2, and 

wherein at least a first cassette is different from a second cassette. 

In a separate aspect of the invention there is provided a method for mixing of 
10 heterologous genes in expression cassettes located on artificial chromosomes or 
plasmids. wherein a biological selection pressure is used to ensure that only cells 
with the desired functionality(les) survive this selection pressure. The biological 
selection pressure may e.g. be based on a reporter system which is transformed 
Into the host cells prior to transfomrtation of the artificial chromosomes or plasmids 
!5 with the expression cassettes or it may be a medium based selection pressure or 
any of the other selection methods described in the present application. The 
biological selection pressure is applied as soon as possible after insertion of the 
artificial chromosomes or plasmids, just allowing the cells time to recover after the 
transformation. This method more closely resembles natural evolution of cells and 
to will eventually cause cells to evolve the properties selected for. As with the other 
methods this method can be combined with any other method (mating, protoplast 
fusion, physical isolation) according to the present invention. 

Definitions 
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Diploid type: A mated cell that contains at least one artificial chromosome from each 
parent population of a mating round. Depending on the number of artifidal 
chromosomes In the parent populations, mating will result in many diploid types. The 
presence of artificial chromosome of type A and B In one parent population and.type 
5 C and D in another results In the following diploid types: AC, AD, BC. BD, ABC, 
ABD. ACD. BCD, ABCD. 

Fused cell type: a cell resulting from protoplast or cell fusion, which carries at least 
one artificial chromosome from each of the two populations that fomned the fused 
10 cells. Depending on the number of artificial chromosomes in the two initial 
populations, fusion vmII result in many fused cell types. The presence of artificial 
chromosome of type A and B in one initial population and type C and D In another 
results in the following fused cell types: AC, AD, BC, BD. ABC. ABD, ACD, BCD, 
ABCD. 

15 

Protoplast a cell from which the cell wall has been removed. 

Artificial chromosomes: As used herein, an artificial chromosome (AC) is a piece of 
DNA tiiat can stably replicate and segregate alongside endogenous chromosomes. 

20 For eukaryotes the artificial chromosome may also be described as a nucleotide 
sequence of substantial length comprising a functional centromer, functional 
telomeres, and at least one autonomous repficafing sequence. It has the capacity to 
accommodate and express heterologous genes inserted therein. It Is referred to as 
a mammalian artificial chromosome (MAC) when it contains an active mammalian 

25 centromere. Plant artificial chromosome and insect artificial chn>mosome (BUGAC) 
refer to chromosomes tiiat include plant and insect centromere, respectively. A 
human artificial chromosome (HAC) refers to a chromosome ttiat includes human 
centromeres, AVACs refer to avian artifidal chromosomes. A yeast artifidal 
chromosome (YAC) refers to chromosomes are functional in yeast, such as 

30 diromosomes that include a yeast centromere. 

As used herein, stable maintenance of chromosomes occurs when at least , about 
85%, preferably 90%, more preferably 95%. more preferably 99% of the cells retain 
the chromosome. Stability is measured in the presence of a selective agent 
35 Preferably these chromosomes are also maintained in Oie at>senoe of a selective 
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agent Stable chromosomes also retain their structure during cell culturing, suffering 
neither intrachromosomal nor interdiromosomal rearrangements. 

Expression cassettes comprising heterologous genes: By the term ""expression 
5 cassettes" is meant oontrollably expressed nudeotlde sequences, of the formula: 
PR-X-TR 
wherein 

PR denotes a promoter, capable of functioning in a cell, 

X denotes an expressible nudeotlde sequence, 
10 TR denotes a terminator, capable of functioning in a cell. 

The promoter and expressible nucleotide sequence may be natively assodated but 
preferably the promoter is heterologous to the expressible nucleotide sequence. 

Selectable marker any gene, which provides the cell with a function, for which 
15 selection can be made to ensure the presence of the selectable marker In the cell. 
Typical examples of selectable markers include auxotrophic markers and drug 
resistance markers. Further examples of selectable markers include markers giving 
a colour or a particular morphology, which may be selected for in a flow cytometer or 
by hand. Still further examples of selectable markers include resistance against 
20 physical or chemical conditions, e.g. radiation resistance, salt resistance. These 
may be selected for on the basis of survival. Other types of markers indude 
nudeotide probes labelled with a stain/dye molecule or preferably a fluorescent 
signal molecule. 

25 Desdriptlon of the drawings 

Fig. 1 shows one schematic example of mixing of genes using mating and selection. 
Different shades of \he artifidal chromsomes (native chromosomes left out) illustrate 
different types of artifidal chromosomes. 

30 

Fig. 2 shows an example of multiple parameter screening for compounds 
synthesised by cells, where the compounds inhibit CoX'-2 and NF-kB and do not 
Ihibrt Cox1. It is shown that in early rounds, cells that meet one, two or all three of 
the criteria are selected. In later rounds, only cells that fit all the selection criteria are 
35 selected. 





Fig. 3 shows an example of a multiple parameter screen including S. Aureus growth 
Inhibition, DMA Polymerase III inhibition and P450 inhibition. The screen is 
assembled by, .for example, transforming a library of producer strains with GFP 
5 reporter systems for a few selected human P450s and for recombinant Bacillus 
subtilis DNA Pol III. The library is then plated and overlayed with an MRSA strain. 
The compounds have to cross the producer's cell wall and reach the MRSA strain 
thus the screen will also select for compounds that have a reasonable solubility 
profile. Producer cells in zones cleared of MRSA cells and which produce the 
10 desired combination of fluorescent colours are selected. 

Fig. 4 illustrates a multiple parameter screen set-up for cancer chemoprotectanL In 
the assay, a producer species library is encapsulated so that on average each 
capsule has 1 cell and allow to grow for a few generations. These clonal lines are 
15 then double encapsulated with a pemneabilised yeast that contains a human DNA 
Topo II a reporter system. The gel droplet environment contains etoposide (a 
poison) and a DNA double strand break stain. Gel droplets where the yeast cells in 
the outer layer have survived and that do not fluoresce or are stained are selected. 

20 Rg. 5 shows a multiple parameter screen set up where the gel encapsulation of a 
producer species library reporting RXR-RXR activation with a mammalian cell line 
reporting PPARf-RXR activation as well as P450 inhibition. Gel droplets that 
indicate PPARy-RXR activation but not RXR-RXR or P450 inhibition are selected. 

25 Rg. 6 shows an example of a multiple parameter screen for absorption and a 
pharmacological activity: By using a dual culture system and timing the time of cell 
selection, it is possible to select producer cells that have the desired 
pharmacological activity and a good absorption profile. 

30 Rg. 7 shows an example of a screening system which minimises the number of 
false positives generated by compounds that are rapidly metabolised by the human 
DMEs and also leads to the discovery of compounds that are active after being 
metabolised and which would otherwise remain undiscovered. 
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Rg. 8 shows a schematic representation of a screening system of the present 
invention to evaluate target activity, metabolism by DMEs and cytotoxicity: Using a 
double gel encapsulation system where in the first droplet are clonal lines of the 
producer species transfonmed with the pharmacological target and DMEs. and In the 
5 second droplet are hepatocytes, it is possible to screen for target activity, DME 
metabofism and hapatotoxidty. 

Fig. 9 shows a flow chart of the steps leading from an expression state to 
incorporadon of the expressible nucleotide sequences In an entry library (a 
1 0 nucleotide library according to the invention). 

Fig. 10 shows a flow chart of the steps leading from an entry library comprising 
expressible nucleotide sequences to evolvable artificial chromosomes (EVAC) 
transformed into an appropriate host cell. Fig. 10a shows one way of producing the 
15 EVACs which includes concatenation, size selection and insertion into an artificial 
diromosome vector. Rg. 10b shows a one step procedure for concatenation and 
ligation of vector anms to obtain EVACs. 

Rg. 11 shows a model entry vector. MCS Is a multi cloning site for inserting 
20 expressible nudeotide sequences. Amp R is the gene for ampldllirt resistance. Col 
E is the origin of replication in E. coll. R1 and R2 are restriction enzyme recognition 
sites. 

Fig. 12 shows an example of an entry vector according to the Invention, EVE4. 
25 MET25 is a promoter, ADH1 Is a temiinator, f1 is an origin of replication for 
filamentous phages, e.g. M13. Spacer 1 and spacer 2 are constituted by a few 
nucleotides deriving from the multiple cloning site, MCS, Scfl and Asd are restriction 
enzyme recognition sites. Other abbreviations, see Rg. 11. The sequence of the 
vector is set forth in SEQ ID NO 1. 

30 

Rg 13 shows an example of an entry vector according to the invention, EVE5. CUP1 
Is a promoter, ADH1 is a terminator, f1 is an origin of replication for filamentous 
phages, e.g. Ml 3. Spacer 1 and spacer 2 are constituted by a few nucleotides 
deriving from the multiple cloning site. MCS. Scfl and AscI are restriction enzyme 
35 recognition sites. Other abbreviations, see Rg. 1 1 . The sequence of the vector is set 
forth In SEQ ID NO 2. 
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Fig 14 shows an example of an entry vector according to the invention, EVES. CUP1 
Is a promoter, ADH1 is a tenrninator, f1 Is an origin of replication for filamentous 
phages, e.g. Ml 3. Spacers is a 550 bp fragment of lambda phage DNA fragment 
Spacer4 is a ARS1 sequence from yeast Scfl and Asd are restriction enzyme 
recognition sites. Other abbreviations, see Fig. 11. The sequence of the vector is set 
forth in SEQ ID NO 3. 



Rg. 15 shows an example of an entry vector according to tiie invention. EVE9. 
10 Met25 is a promoter, ADH1 is a terminator. Spacer 5 and 6 are lambda phage DNA. 
SEQ ID NO 5. 

Fig. 16 shows a vector (pYA04-Ascl) for providing amis for an evolvable artifidal 
chromosome (EVAC) into which a concatemer according to the invention can be 
15 cloned. TRP1, URA3, and HIS3 are yeast auxotrophic maricer genes, and AmpR is 
an E. coli antibiotic mariner gene. CEN4 is a centromere and TEL are telomeres. 
ARS1 and PMB1 allow replication in yeast and E. coli respectively. BamH I and Asc 
I are restriction enzyme recognition sites. The nudeotide sequence of the vector is 
set forth in SEQ ID NO 4. 

20 

Rg. 17. shows the general concatenation sti^tegy. On the left Is shown a circular 
entry vector with restriction sites, spacers, promoter, expressible nudeotide 
sequence and temilnator. These are exdsed and ligated randomly. 



Lane 


F/Y 
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100/1 
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50/1 
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20/1 
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10/1 
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5/1 


6 


2/1 
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1/1 


8 


1/2 


9 


1/5 



Legend: Lane M: molecular weight marker, X-phage DNA digested w. Pstl. Lanes 
1-9, concatenation reactions. Ratio of fragments to yao-arms(F/Y) as In table. 




Fig 18a and 18b. illustrates the integration of concatenation with synthesis of 
evolvable artificial chromosomes and how ooncatemer size can be controlled by 
controlling the ratio of vector arms to expression cassettes, as described in example 
14. 

5 

Fig 19. EVAC gel Legend: PFGE of EVAC containing clones : 
Lanes, a: Yeast DMA PFGE markers(strain YNN295), b: lambda ladder, c: non- 
transformed host yeast 1 - 9 : EVAC containing clones. EVACs In size range 1400- 
1600 kb. Lane 2 shows a clone containing 2 EVACs sized -1500 kb and -550 kb 
10 respectively. The 550kb EVAC is comigrating with the 564kb yeast chromosome 
and is resulting in an Increased Intensity of the band at 564 kb relative to the other 
bands in the lane. Arrows point up to EVAC bands. 

Fig. 20 shows an example of generation of an EVAC containing cell population. 
15 EVACs (Evolvable Artificial Chromosome) are artificial chromosomes composed of 
concatemers of expression cassettes containing heterologous DNA, so that each 
gene is under the control of an externally controllable promoter. Large numbers of 
heterologous genes from multiple sources can thus be combined in a single host 
cell. 

20 

Fig. 21 shows the general principle for screening EVAC containing cell populations, 
ampfified. The cell population is amplified and subjected to a panel of screens that 
are relevant to a desired functionality. Positive subpopulations are selected. 

25 Fig. 22 shows how cell populations evolve through a tiered set of selection condi- 
tions, always taking the best performing cell populations further in the process until 
an optimal functionality/property is evolved. 

Fig. 23 shows a general screening strategy. Independent populations are subjected 
30 to the same set of screens, and genetic material from the different selected sub- 
populations is combined together with novel genetic diversity introduced between 
selection rounds. 



35 



Fig. 24 shows physical remixing of EVACs. EVACs are isolated from the host and 
used for transformation of either empty host cells or for transformation of host ceils 
already containing EVACS to obtain new combinations of EVACs in each hcpt cell. 




Fig. 25 shows one example of evolution. Cells that are resistant to a poison may be 
selected in liquid media. The surviving cells are cells containing EVACs that result in 
the production of compounds that prevent the poison from interacting with Its target. 

5 

Fig. 26 shows how an evolution programme based on a screen for compounds that 
activate (or prevents) activation of a reporter system may be designed. Using the 
appropriate marker (e.g. GFP) positive clones can be selected using e.g. flow cy- 
tometry. 

10 

Fig. 27 shows an example of controllable gene expression In a cell population con- 
taining EVACs enriched in genes that code for carotenoid synthetic en:^mes. The 
expression cassettes contain either a Met 25 or a CUP I promoter. Orange and red 
colonies ar obtained as a function of the promoter activation. Intensity of colour and 
15 number of coloured colonies increases in the following order CUP + Met > CUP > 
Met Uninduced colonies are white. 

Detailed Description of the Invention 

20 The following provides a badcground description on how to apply the gene mixing 
methods according to the invention and how to use these in the directed to evolution 
of cells to acquire new functionalities. 

The present invention relates to methods of mixing of genes for the purpose of 
25 evolving cells that produce compounds novel substances and/or metabolic 
pathwayshaving at least one desired funcBonallty. The mixing and/or evolution may 
lead to the production of novel molecules of commercial value, such as 
pharmaceuticaJs, cosmetics, flavours, other food and animal feed Ingredients, 
agricultural chemicals, colouring agents, diagnostic mariners, industrial chemials and 
30 intermediates for industrial purposes. 

By "Evolution of a celP is meant change of a cell's phenotype towands a novel 
phenotype due to expression of a novel combination of genes. By "evolution of a 
composition" is meant change of the properties of a composition due to a novel 
35 combination of cells expressing a novel combination of genes. 
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In the following the tenm mixing is mainly used for describing the assembly of 
expression casettes and the terni remrang is mainly used for the process referring 
to further mixing of expression casettes as described In the dalms. However, as 
both terms encompass mixing, the term mixing is also used for the remixing steps. 

In seeking to evolve molecules with defined pharmaceutical, industrial, nutritional 
properties one must have a method of selecting for tfiose genetic patterns that 
encode for phenotypes that are consistent with these properties. 

Each cell in a cell population, given that it is genetically different from other cells, 
has an intrinsic variability that can potentially express itself in one or more ways. For 
the purposes of the current invention the term Output shad be taken to mean a 
property of the cell that is consequent to the expression of one or more expression 
cassettes. Optionally the property may be consequent to both the expression of one 
or more expression cassettes and the expression of a certain set of host genes. 

Outputs can be measured according to various different criteria. These criteria may 
be directly or indirectly linked to the functional or stmctural properties that are being 
optimised. Alternatively they may be inversely linked to functional or structural 
properties that are not desired. 



Outputs can be measured either directly or by means of a reporter construct For the 
purposes of this document the temi Reporter Construct shall be taken to mean a 
. genetic or molecular device for measuring whether a given cejl or subset of cells In a 
cell population vary in respect of a given output from other cells or subsete of cells in 
the cell population. Example reporter coftstructe Include a genetic construct that 
produces a fluorescent protein in response to the activation of a transcription fector 
by an output. Another example of a reporter construct is a coloured enzyme 
substrate, to which an enzyme is added that converts the substrate to another 
molecule with a different colour. Should the cell produce an output that inhibits the 
enzyme, the colour change will not occur. 



35 



Outputs that can be measured without a reporter construct include without limitation 
ttie survival of cells subjected to the screening criteria, cells able, to metabolise a 



P 669 DKOO 

predetermined substance, cells able to produce a substance that preferentiaBy 
absorbs electromagnetic radiation at one or more frequencies, cells having 
enzymatic efficacy in the media etc. 

5 Reporter construct can be placed proximal either before or after the expression 
construct is engineered into the cell. l\4ethods of incorporating the reporter construct 
Into a proximal location include but are not limited to standard transformation 
techniques, the mating of two different yeast mating types, or syst^s providing 
physical proximity between cell and reporter constmcl. for example gel microdn^plet 
1 0 co-encapsulation of cell and reporter construct. 

The terni Proximal shall be taken to mean a location that is either in the same cell as 
the expression construct or sufficently close to said cell such that the concentration 
of a molecule or molecules diffusing from an intact or lysed cell, or being actively 
15 pumped from the cell, is at least one picomole in the vicinity of the location 

Outputs of cells that may be measured either by proximal reporter contructs or by 
other means include, but are not limited to: 

• Novel spectral properties 

20 • Induced cytochrome oxidase activity 

• Changed size, morphology, stickiness or adhesive properties or lack thereof 

• Ability to grow on substrates they cannot normally grow on 

• Ability to grow on sublethal substrates 

• Ability to grow in the absence of normal essential requirements 
25 • Ability to grow on media comprising one or more inhibitors 

• Ability to grow under changed physical conditions, such as temperature, 
osmolarity, electromagnetic radiation including light of certain wavelengths. 

• Ability to grow under magnetic field of certain force. 

• Secretion or the lack of it from the cell 

30 • Theinhibibonorpreventionof inhibition of an enzyme 

• The activation of a receptor. 

• The prevention of an activating molecule binding to a receptor. 

• The Inhibition or promotion of binding of small molecules or proteins to 
nucleic acid or peptide sequences. 
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• The inhibition or promotion of transcription or translation of post translational 
processing. 

• Changes In the transport or localisation of molecules within the cell or within 
organelles. 

5 • Changes in the DNA content or morphology of the cell. 

• The production of small molecules with certain properties that allow their 
selective isolation (e.g. all the chromoatography principles available to the 
skilled practitioner). 

• The production of small molecules with certain spectroscopic properties 
10 (defined broadly to Include visible light, microwaves, IR. UV, X-ray, etc.). 

• Changes in Oie morphology of the cell, including the prevention or promotion 
of cell differentiation. 

• The induction of apoptoUc pathways. 

• Chemical indicator. 



15 



Diverse Genetic Patterns 



Given that evolution is a statistical process it is necessary to provide sufficient 
genetic variation on which selection processes can act. In the present Invention, this 
20 comprises two elements 

• Providing a sufficiently large and diverse population 

• Controlling the genetic basis of the diversity and how it expresses 

Selection requires genetic diversity on whi(*i to operate. Thus the first requirement 
25 of the current invention is to provide a population of cells that embodies a genetic 
diversity. The term "genetfc diversft/' means that substantially all cells are different. 
In that they comprise different genes, and/or Identical genes under control of 
different control system, such as different promoters, such that almost each cell 
initially represents a genotype not represented in any of the other cells. Of course 
30 due to cell division a few cells may be substantially identical. 

The temi "Cell Population" shall be taken to mean a population of cells where at 
least 10^ cells, suc^ as at least 10^ cells, such as at least 10^ cells, such as at least 
10^ cells, such as at least 10® cells, such as at least 10^ cells, such as at least l6^° 
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cells, such as at least 10" cells, such as at least 10'^ ceUs In the population 
represent a genotype not represented In any of the other cells. 

Thus, the principle of the ^lution method according to the invention Is to obtain a 
population of cells having a very high genetic diversify. 

One particular emlradiment of this principle is to produce cells with combinations of 
concatemers comprising cassettes with expressible nucleotide sequences from a 
number of different expression states, which may be from any number of unrelated 
or distantly or closely related species, or fiom species from different kingdoms or 
phylae, novel and random combinations of gene products are produced in one 
single cell. 

By inserting novel genes into the host cell, and especially by inserting a high number 
of novel genes from different expression states, such as from a wide variety of 
spedes into a host cell, the gene producte from tiiis array of novel genes will interact 
witti tiie pool of metabolites of Uie host ceO and wiOi each other and modify icnown 
nietabolites and/or intermediates in novel ways to create novel compounds. Due to 
the high number of substantially different cells ttiat can be generated using ttie 
methods according to the present invention, for example at least 10* cells, such as 
at least 10^ cells, such as at least 10* cells, such as at least 10' cells, such as at 
least 10*, such as at least 10®, for example at least 10^° such as at least 10", It is 
more or less Inevitable or at least likely tiiat such large populations m lead to a 
sub-population having such an interaction. The subiaopulation having such 
interaction may comprise at most 10^° cells, such as at most 10® cells, such as at 
most 10* , such as at most 10' cells, such as at most 10* cells, such as at most 10* 
cells, such as at most 10* cells, such as at most 10^ cells, such as at most 10^ cells 
or just 10 cells. 



Generation of Novel Genetic Compositions 

It Is a requirement of evolutionary processes ttiat new patterns are generated either 
in parallel to or sequential to selecOon steps. In systems where the patterns are 
based on genetic elemente this requires that elttier new genetic elements are 
introduced or new combinations of existing genetic elements are created or botti. 
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In the present Invention new patterns can be achieved through one or more of the 
following processes. The temi combining or remixing shall be taken to mean a 
process of generating new combinations of expression constructs using one or more 
of these approaches. The combination or rembdng may be conducted at any step of 
5 Oie selection process and a preferred timing is when cells having elements of the 
predetermined functionality have been found in at least one of the compositions, and 
• preferably In at least 0.1%, such as at least 1%, such as at least 2%. such as at 
least 5%, such as at least 10% or at least 50% of compositions. The term Daughter 
Population shall be taken to mean a cell population that is predominately genetically 
10 descendant from those cells in one or more cell populations that had a fitness score 
above a certain threshold and that is furtiier characterised by most of tifie cells in the 
daughter population ha\^ng been generated through a remixing step. 

In principle Hie combination or remixing may be conducted by at least the following 
15 approaches: physical isolation and remixing of expression cassettes, physical 
Isolation and remixing of artificial chromosomes containing expression cassettes, 
sexual crosses, cell- or protoplast fusion (vide Hugerat Y, Spencer F, Zenwirtti D. 
SImchen G (1994). Genomics 22(1). p. 108-117). and YAC-duction (vide Curran BP. 
Bugeja VC (1996). Mettiods MoL Biol. 53. p 45-49. 

20 

Physical isolation of the expression cassettes and subsequenUy mixing ttie 
cassettes may be used togetiier wltti tiie mating and protoplast/cell fusion methods 
of Uie present invention. One advantage of ttiis approach Is ttiat any accumulating 
host mutations are removed by ttie remixing of genes into new host lines. Reporter 
25 genes can also be introduced as part of tills process, allowing for ttie introduction of 
intracellular reporter assays. The remixing is preferably carried out in vitro by 
removing the expressible sequences from at least two different cells, combining ttie 
individual expressible sequences in vitro, and introdudng at least two combined 
expressible sequences into at least two cells. 

30 

Due to the common structure of the expression cassettes according to a preferred 
embodiment of the invention, tiiese may easily be excised from the host cells again 
using a restriction enzyme specific for the rsi-rsa restriction site According to the 
present invention the enzyme specific for the rsrrs2 restiiction site is preferably a 
35 rare cutter ttierefore the likelihood of cutting host genomic DNA fragments wtii a 
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size similar to the size of the expression cassettes is very limited. After excision the 
expression cassettes may be mixed with other expression cassettes of similar 
structure and be re-concatenated and re-inserted into another host cell in another 
combination seating a greater diversity during the evolution steps. 

5 

The combination of expressible sequences may of course also be a combination of 
full length chromosomes in the cells, such as combination of artificial chromosomes. 
Combination of the arWidal chromosomes may be achieved In at least 4 ways 
depending on the host cells. These are physical Isolation, crosses, protoplast fusion 
10 and YAC-ductlon as described herein. 

An altemative way of physically remiwng expression cassettes is to isolate the 
artificial chromosomes from one or more cell populations and re-transform new host 
cells. The host cells may or may not already contain artificial chromosomes 
15 containing expression cassettes. 

Addition of new genetic material. 

The remixing is preferably conducted with addition of new genetic material from 
20 another cell composition. The other composition may be chosen from compositions 
capable of expressing at least one predetermined phenotype. such as a protein or a 
metabolite, or it may be chosen at random. 

In one embodiment It is desireble to conduct selection in a series of isolated 
25 populations that are then brought together once they have independently evolved 
useful traits. In this manner the use of Independent selections for same phenotype 
provides different genetic backgrounds (a form of parallel evolution) that can then 
ideally act synergistically with each other. 

30 In another embodiment the result of selection on two or more compositions is mixed 
at a certain step of evolution to create further modified compositions when aiming for 
at least one cell having the desired functionality. 



35 



Recombination of the expressible sequences, i.e. changes of the genetic material by 
for example cross-over, may be optionally avoided, due the construction of the 
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genetic inserts, in parttcuiar spacer sequences, as well as due to a general attempt 
to suppress recombination in the cells. Thereby combination of the genetic material 
is favoured, leading to combination of intact genes or cDNA material, without the risk 
of destroying the function of the genetic material due to recombination. 

5 

After having obtained daughter populations exhibiting the desired functionality, the 
daughter population may then be subjected to further steps of screening and 
selection in order to optimise the cells. 

10 Novel Molecules and Pathways 

The aim of the evolution method according to the present invention is to evolve cells 
capable of producing new substances, such as new metabolites, new proteins, 
and/or having new pathways. 

15 

Thus, in a further aspect the present invention relates to a substance produced by 
the cells evolved according to the present invention, said substance being 
metabolites, proteins, carbohydrates, poly- and oligosaccharides, and ribonucleic 
adds. Since some of the interactions that produce the novel phenotypes are 
20 mediated by enzymes it is likely that the result will include novel compounds with 
chiral centres, which are especially difficult to produce via chemical synthesis. 

Creation of novel pathways, may lead to the capability of creating ceils capable of 
metabolising, i.e. converting, a compound, which is not metaboiisable by the native, 
25 un-evolved cell. Thus, in particular the substance is a metabolite. 

MULTIPLE PARAMETER SCREENING 

For a compound to be useable as a drug it must fulfil multiple functional require- 
30 ments. It must interact with the target(5) and affect the function of the target in the 
desired manner. At the same time it should not interact with many other (often simi- 
lar) targets, have major non-specific effects. And then it must further have the right 
physical-chemical parameters arid be metabolised by the body in an. acceptable 
manner. • • 

35 
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Because of this intrinsic difficulty and complexity, the process of discovering and 
developing dnjgs has a very poor success rate and is thus extremely expensive 
($600mn per successful compound) and very time consuming (c. 8-12 years from 
discovery to clinic). Only c. 1 in 15 primary screens produce a compound that makes 
5 It into pre-cHnical development and onty 1 1n 10 of these compounds then make it to 
market The average pharmaceutical company spends 250 man-years of research 
and development effort for every compound that enters the clinic. Most phannaceu- 
tical companies are. In consequence, falling to launch new drugs at the rate they 
require to satisfy their investors. 

10 

An alternative to the cunent process is the evolution of smaU molecules compounds 
towards multiple properties simultaneously, with these properties t}eing related, ra- 
ther directly or indirectly to the therapeutic target(s) the small molecule has to inter- 
act with, the targets it should not interfere with, the AOMET properties it should fulfil, 
15 etc. 

Muttiple Pharmacological Acti\HOes 

Due to the vast number of known targets and relationships between those targets 
20 that are cun-entiy known, it is not in the scope of the present invention to describe all 
know targets and their correlations. Table 1 discloses a list of relevant 
pharmacological targets. 

Table 1: Drug targets 

25 

3fi hydroxysteroid dehydrogenase 
3-hydroxy-3-methylgiutaiyI coen^me A 
S-adenosyl homoc^teine hydrolase 
5-HT3 receptor • 
5-HT4 receptor 

23S rRNA of tile SOS ribosomal unit 
SOS rRNA from 50S ribosomal unit 
50S ribosomal unitbindir^ site 

a2 antiplasmin 

a-adrenergic receptor 

a-subunit of NaVK* ATPase (3 isofomis) 

a-amylase 

a-glucosidase 

ACTH receptor 

Adenosine deaminase 

Adrenocortical steroid synthesis 
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Adrenocorticosteroid receptor 
Adrenergic receptor Pi, p2 
Adrenocorticotropic hormone 
Androgen receptor 

Anglbtensin-converting enzyme (ACE) 
Angiotensin II formation 
Angiotensin II receptor 
Antiplatelet/antithrombotic agent 
Arginfne vasopressin receptor 
Angiotensin receptors, AT1, AT2 
ATP-sensitive K* channel 
Antigcoagufant protein C 
Antigcoagulant protein S 
Androgen receptor 
Apoptosis 

Aminoacyl tRNA site on 30S rit>osomal unit (tetracycline) 
Acetylcholinesterase 
Adrenergic receptors a1. a2, pi. p2, p3 
Aromatase 

ATP sensitive K* channels 
Ascorbic acid 

p-amyloid 

p-adrenergic receptor 
p-lactamase 

P-subunit of DNA-dependent RNA polymerase 
p-adrehergic receptors, b1 
p-tubulin subunit of microtubules 
Benzodiazepine receptor 
Butyrylchollnesterase 
Bradykinin receptors, Bi, and B2 

Carbonic anhydrase, type IV» II 
Ca^^ channel 

Ca^* channel, Voltage-activated T-type 
Catechol-O-methyltranferase 
Calcitonin 

Cell surface receptors for sulfonylureas on pancreatic p cells 

Cell surface receptors for glitinides on pancreatic B cells 

Cholecystokinin (CCKa, CCKb) 

Choline acetyltransferase 

Cholinesterase 

Carnitine 

Calcineurin 

Corticosteroid nuclear receptor 
CyclophiHn, cyclosporin binding protein 
CD3 glycoprotein on T lymphocytes 
CD33 receptor 
CD20 receptor 
CG-rich DNA (actlnomycin) 
Coagulation factor II, VII. IX, X 
Corticosteroid adrenocortiootropin receptors 
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CyclooxygenasB 1, 2 (COX-1, COX-2) 
Cyclic nucleotide phosphodiesterase 
Cyclooxygenase 
Q/tochrome P450 reductase 
Cytochrome P450 11p (11p hydroxylase) 
Cytochrome P450 17a C17-2Q lyase 
Q4ochrome P450 aldo. aldosterone synthase 
Cytochrome P450 side chain cleavage (sec) enzyme 
Cytochrome P450-<Jependent sterol 14 a-demethylase • 

D-alanyl o-alanine synthetase 
Dihydropteroate synthetase 
Deoxycytidine kinase 
Dihydroorotate dehydrogenase 
Dihydrofolate reductase 
Dopamine D1-D5 receptors 
DNA chain elongation factor 
DNA cross-linking 
DMA-dependent RNA polymerase 
DNA gyrase, subunit a 
DNA methylatlon 
DNA polymerases l+||| 
DNA primase 
DNA topoisomerase 
DNA alkytation 
DNA topoisomerase IV 
DNA alkylation (oxamnlquine) 

Erythropoietin 
Endo-p-d-glucuronidase 
Estrogen receptor 

Factors VII; VIII 

Fusion protein (respiratory syncytial virus) 

FKBP, tacrolimus binding protein, FK506 binding protein 

Folic acid 

Follide-stimulating hormone (FSH) 
FSH receptor 



Glycerol phosphate oxidase 

GABAa receptor (6a variants, 3p. 26, 3y variants 

GABA transaminase 

GABAA-associated Ion channel 

Glutamic add decarboxylase 

Glutamate/aspartate receptors, AMPA. GLU 1-4. KA. GLU 5-7, NMDA 1 
mGLU 1-7 

Glycinamide ribonucleotide transfbrmylase 
Granulocyte colony-stimulating factor receptor 
GHRH receptor 
Glucagon receptor 
Glucoamylase 



P 669 DKOO 



33 



Glucocorticoid receptor (GR) 
GnRH receptor 

Gonadotropin releasing honnone (GnHR) 
Guanylyl kinase 

G-protein coupled adenosine receptor 

Ganglionic adrenergic neurons/norepinepiirine transporter 

Guanylate cyclase (nitroprusside) 

Guanylyl cyclase (NO) 

Granulocyte colony-stimulating factor 

Granulocyte-macrophage colony-stimulating factor 

Growth hormone receptor 

Growth homione-releasing honmone (GHRH) 

Glycine receptor a, p 

H*. K* ATPase, proton pump 

Hi histamine receptor 

I-I2 histamine receptor 

IHCI secretion by gastric cells 

Helicase 

HIV Protease 

HSV thymidine kinase 

Hemoglobuiin protease 

Heparin antagonist 

Hypoxanthine-^uanine phosphoribosyl transferase 
Her-2 receptor 

Histamine receptors Hi, H2, H3 
Hepatic suifotransferase as a catalyst 



Intercellular adhesion molecule 1 
Interieukin 1 receptor 

Interteukin (IL-1. -2, -3, -4, -5, -6. -7, -8. -9, -10, -11;-12 

lnterieukin-2 receptor 

IGF-1 receptor. iGF-2 receptor 

lodothyrinine-59-deiodinase. type 1, type 2 

Influenza A virus M2 protein 

Inosine 5* phosphatedehydrogenase 

Insulin-like growth factor 1 

interieukin-2 receptor 

Inosinate dehydrogenase 

Interferon a 

Interferon a receptor 

Inosine monophosphate dehydrogenase 

Integrase 

Interferon a 

Interferon a receptor 

Interferon y 

Insulin 

Insulin-like growth factor (IGF-1, IGF-2) 
Insulin receptor, a and p subunits 
Insulin transporter 
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Kainkreln, aprotinin, C-esterase. a2 macroglobulln 
Kinin 

L-alanyl racemase 

L-aromatic amino acid decarboxylase 

L-type voltage-sensitive Ca^* channel 

Leukocyte integrins 

Leukotrlene A hydrolase 

Leukotriene B4 receptor 

Leukotriene C4 receptor 

Leukotriene C synthase 

Leukotriene D4/E4 receptor 

Lipocortin (protein), inhibits phospholiphase A2 

Upoxygenases (12-lipoxygenase (platelets), lipoxygenase (leukocytes) 

LH/choriogonadotropfn (CG) receptor 

Luteinizing hormone (LH) 

Lactamase 

Lipoprotein Hpase 

Ml receptor muscarinic cholinergic 
H and 5 receptor in gastrointestinal tract 
Macrophage colony-stimulating factor 
Microbial dihydrofolate reductase 
Microtubular protein 
Mineralocorticoid receptor 
Mineralocorticoid receptor (MR) 
Monoamine oxidase (MAO)-A 
Monoamine oxidase (MAO)-B 
Muscarinic receptor, M,, 3 subunits 
Muscarinic receptor, Mz , 3 subunits 
Muscarinic receptor, M3 , 3 subunits 
Muscarinic receptor. M4 , 3 subunits 
Mycobacterial RNA polymerase 

N-acyl hydrolase. 
Na* channel, a1, pi, P3 
Na* channel a, p, y 
NaVCI-symporter 
Na*^V2CI-symporter 
Niacin receptor 
Nicotinic acid 
Nicotinic receptor 

Nicotinic cholinergic receptors, muscle Nm o. p, 5, y, e 

Nicotinic cholinergic receptors, neuronal, Nn a2, a3. a4, a5. oe6, a7, 08 a9 

P2, P3, P4 

Neiiramidase 

Neuropeptide Y, Y1 , Y2 receptors 
Noradrenaline transporter 



Opioid receptors jina, 8,^, kv3 
Oxytocin & receptor 
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Platelet-derived growth fector 
Parathyroid hormone (PTH) 
Peroxidase 
Progesterone receptor 
Prolactin 

Prolactin receptor 

Parasite p-tubulin 

Parasite dthydrofolate reductase 

Parasite glutamate gated Cr channel 

Penicillin-bindmg protein la (PBP 1a. lb), transpeptidase 

PBP 2a. 2b 

PBP 3, 4. 5, 6, 7 

Platelet glycoprotein llb/IIla (fibrinogen receptor) 
Plasma protein transferrin (pi glycprotein) 
Pyridoxine receptor 
PenicilloyI enzyme 

Peptidyl site of the SOS ribosomal unit 
Primase 

Phosphodiesterase (type IV. cyclic nucleotide phosphodiesterase) 

Phospholiphase A2. C 

Platelet-activating factor 

Prostacyclin synthase 

Plasmodial heme polymerase 

Progesterone receptor 

Pyridoxine 

Phospholipase Cp 

Purine receptors, P1 (Ai.2a.2b.3). Pax. P2Y 
Peroxisome proliferator-activated receptor 
Pancrelipase 
Potassium channel 
Prostaglandin 150H dehydrogenase 
Prostaglandin D-DP receptor 
Prostaglandin El, E2, E3-EP receptor 
Prostaglandin F-FP receptor 
Prostaglandin 12-IP receptor 
Prostaglandin I2 (PGI2) receptor 
Prostaglandin F2 receptor 
Prostaglandin synthetase 
Prostaglandin preceptor 



Reverse transcriptase 

Ribosomal protein from SOS ribosomal unit (streptomycin) 
RhO X H / / 

Riboflavin receptor 
Retinoic acid a, X receptors 
Ribonucleoside diphosphate reductase 
Ribonucelotide reductase 



Somatostatin 

Somatostatin receptors, several 
Steroid S a reductase 1,2 




Sucrase 

Squalene epoxidase 
Stem cell factor, c-kit ligand 

Serotonin receptors (5-HT) 5-HTia4=. S-HTa/vc. S-HTa, 5-HT4.7 
Succinic semialdehyde dehydrogenase 
Spindle formation 
Scission of DNA 

Secretion of vasopressin K receptor 

Topoisomerase I, II, III, IV 
Tubulin 

Thrombopoietin 
Thrombin 

Tissue plasminogen activtor 
Thymidylate synthetase 
Tachykinins, NK1. NK2. NK3 
Tryptaminergic receptor 

Thromboxane A2 TP receptor, platelet and non-platelet 
Thromboxane synthase 

Thyroid-stimulating hormone (TSH) receptor, TRot 1,2. TRp 1,2 

Tumor necrosis factor receptor 

Trypanothione reductase 

Type I cyclic nucleotide phosphodiesterase 

Type III cyclic AMP phosphodiesterase 

Type V cyclic nucleotide phosphodiesterase 

Transpeptidase 

Thymic lymphocyte antibodies 

Tumour necrosis factor alpha 

Thiamine 

Uridine monophosphate pyrophosphorylase 

Vascular cellular adhesion molecule 1 receptor 
Vasopressin receptcHS Via. Vib, V2, 
Viral DNA polymerase 
Vitamin A nuclear receptor 
Vitamin E 

Vitamin K & receptor 
Vitamin B12 receptor 
Vitamin D nuclear receptor 
Voltage-activated Ca^* channel, L-type 

5 

Below are examples of diseases and the different targets involved in these 
diseases. It is also presented in outline examples of how new potential drugs for 
these targets would be screened using the present invention. 



1. Disease Target Bacterial infections (inhibition of DNA Polymerase III, P450 
inhibition and Multi-drug resistance S. aumus growth inhibition) 
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The widespread emergence of resistance has significantly , limited the efficacy of 
classical antibiotic therapy for bacterial disease. Fuelled largely by the excessive 
and often unnecessary .use of antibiofics In humans and animals, antibiotic resis- 

5 tance has resulted in increased patient morbidity, mortality and overall cost of health 
care. Methicillin-resistant Staphylococcus aureus (MRSA) is now the most prevelant 
nosocomial pathogen in the United States, and the enterococct, as opportunistic 
pathogens, are among the top four causes of nosocomial Infection. Indeed, the per- 
centage of enterococcal isolates resistant to essentially every antibiotic, including 

10 vancomycin, continues to increase. Thus a premium is placed upon the discovery of 
inhibitors that function by a novel or at least different mechanism than currently ap- 
proved antibiotics, as these would be expected to circumvent current bacterial re- 
. sistance mechanisms. 

15 S. aureus is a very important human pathogen and has favorable growth character- 
istics for use in high-throughput screening. Use of an antibiotic-resistant strain will a 
priori select for hits that have activity against a multi-drug resistant strain. 

DNA Polymerase III is a DisiA polymerase-exonuclease (Pol-Exo) that is essential 
20 for the replicative DNA synthesis of Gram positive organisms. Since DNA Pol III is 
essential for the replication of Gram positive bacteria, the inhibition of DNA Pol III 
offers a spedfic and alternative way to treat antibiotic resistant gram positive bacte- 
ria. 

25 Many patients with severe disease may be administered multiple anti-infectives as 
well as other drugs to treat (non-infectious) underlying disease. In this case, drug 
classes that are not metabolized via the major P450 liver enzymes are preferable. 

Desired therapeutic profile: 

30 

• Gram positive-specific: Systemic administration of agents with a very broad 
spectrum has the undesired effect of creating resistance in the normal host 
Gastrointestinal flora. Therefore, more disease-specific antibiotics might have an 
advantage in'gaining hospital formulary approval and overall wider acceptance. 

35 




• Orally-active: The ideal drug candidate would be orally-active with additional 
formulations for intravenous use. Multiple dosing is acceptable however anything 
approaching continuous Infusion requires very careful consideration. Improve- 

' ment or equivalence with dosing regimens of competitive therapies Is Important. 

5 

• Safety: The ideal drug candidate would be microorganism-specific and devoid of 
significant side*effects and drug interactions within at least 10-fold of C^ax in the 
therapeutic dosing range. 

10 l^ulttoie oarameter screens: 

A multiple parameter screen would thus include S. Aureus growth inhibition, DNA 
Polymerase III inhibition and P450 inhibition. A screen could be assembled by, for 
example, transforming a library of producer strains with GFP reporter systems for a 
15 few selected human P450s and for recombinant Bacillus subtllis DNA Pol III. The 
library would then be plated and overlayed with an MRSA strain. An assay where 
the compounds have to cross the producer's cell wall and reach the reporter strain 
will also select for compounds that have a reasonable solubility profile. 

20 Rgure 3 exemplifies such a multiple parameter screen where producer cells in 
zones cleared of MRSA cells and which produce the desired combination of 
fluorescent colours would be selected. 

2. Disease target: Cancer - Inhibition of solid tumour growth and prevention of 
25 metastasis (Inhibition of NF-kB, inhibition Cox-2, no Inhibition Cox-1 ) 

Cancer is tiie second leading cause of deatti in the US, causing one in every four 
deattis. Existing tireataments for surgically Inoperable cancers Include chemotherapy 
and radiation treatments. These are highly toxic because they are non selective or 
30 at best only partially selective. There exists a critical need for new ttierapeutics to 
inhibit tumor growth and prevent metastasis. A premium is placed upon molecules 
Uiat prevent metastasis and which woric tfirough a selective mechanism so as to 
avoid or minimize side effects. 
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Nuclear Factor kB (NF-kB) is a transcription factor that, by regulating the expression 
of multiple inflammatory and immune genes, plays a critical role in host defense and 
several pathogenic processes. Its most common inducible forni Is composed of the 
proteins p65 and p50, and usually exists as a molecular complex with one of several 
5 inhibitory molecules, the IkBs. in the cytoplasm. Proteins that arB regulated by NF- 
kB Include TNFo, IL-1p. IL-2, iL-6, IL-8, INOS, COX-2. interceOular adhesion mole- 
cule-1 (iCAM-1), vascular-cell adhesion moiecule-1 (VCAM-1) and E-selectin (Can- 
cer J., 1998, 4. S92: Int J. Blochem. Celt. Biol.. 1997. 29. (6). 867). 

10 Activation of NF-kB can lead to the synthesis of tha inducible form of cyclooxy- 
genase (COX-2). This en;^e has a critical role In the re^onse of tissues to injury 
or infectious agents and are essential components of the inflammatory response, 
the ultimate repair of injury, and carcinogenesis. Several population-based studies 
have detected a 40-50% decrease in relative risk for colorectal cancer in persons 

15 who regularly use Aspirin and other NSAIDs. Attempts to deteimine the molecular 
basis for these observations found that both human and animal colorectal tumors 
express high levels of COX-2. w>hereas the normal intestinal mucosa has low to un- 
detectable COX-2 expression. These findings led to the hypothesis that COX-2 
plays a role in colon cancer growth and progression (Faseb. 1998. 12, 1063). Since 

20 Aspirin also inhibits NF-kB these findings also suggest that inhibiting NF-kB may 
prevent tumour growth and progression. Another way in which COX-2 seems to be 
involved in cancerinogenesis Is by protecting cells from apoptosis (J. Nat. Cancer 
InsL. 1998, 90. (11). 802). Therefore, inhibition of NF-kB can help control tumor 
growth by one further process since it leads to less COX-2 induced protection from 

25 apoptosis. inhibition of NF-kB also leads to an increase in Tumor Necrosis Factor 
(TNF) which in turn leads to an increase In apoptosis. 

Immense efl'ort is being devoted to developing new molecules that are direct inhibi- 
tors of the enzymatic activity of COX-2. However, an alternative approach is to find 
30 new agents that can prevent expression of the respective genes coding for flie ac- 
tivities since ttiere are already examples that Inhibition of a single mediator does not 
eliminate all symptoms of a disease {Inaamm. Res.. 1997. 4g. 282). 

Desired therapeutic profile: 

35 



P669OK00 



40 



• SelBcBvB NF-kB inhibitor: There are several drugs that act by partial inhibition of 
NF-kB but they all produce side effects due to interactions with other targets. 
Any new NF-kB inhbitor would have to be selective. 

5 • Selective COX-2 in/Hbitor witit a Cox-2/Cox-1 differential inhibitory activity as low 
as possible: Prostanoids that are derived from the COX-1 pathway regulate 
platelet aggregation via thromboxane A2, the function and integrity of gut mu- 
cosa, and kidney function via prostaglandin E2 and prostacyclin. Cox-2 is ex- 
pressed in various cell types, including monocytes, fibroblasts and synovial cells, 
10 in response to inflammatoiy stimuli. Consequ^tly. COX-1 inhibition by NSAIDs 

is associated vnth gastrc^ntestinal and renal toxicity, whereas. OOX-2 inhibition 
limits the fonnation of pro-inflammatory at the site of the inflammatory response 
and has anticancer effects. 

15 • Orally-active: Given the severity of the medical problem, an orally-active drug 
would be desired but not essential. 

• Safety. The ideal candidate would be selective and devcrid of significant side- 
effects and drug interacUons. However again, given the severity of many can- 

20 cers and lad< of therapeutic options, there is significant history of compounds 

that are less-tharvideal in these parameters. 

MULTIPLE PARAMETER SCREEN; 

25 A multiple parameter screen set up could for example be the double gel 
encapsulation of a producer species library with 2 different mammalian cell lines. 
The first gel capsule would contain the producer cell and a mammalian cell reporting 
NF-kB and Cox-1 inhibition \ft^ile the second capsule would contain a second 
mammalian cell line reporting Cox-2 inhibition. Gel droplets producing the desired 

30 fluorescence output would be selected. 

3. Disease target Cancer (survival in the presence of DNA Topoisomerase II a 
poisons, no production of DNA double strand breaks and inhibition of human 
- DNA topoisomerase 11 activity) 
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Chemotherapy Is one of the most common approaches to the treatment of cancer. 
All chemotherapy dmgs interfere with cell growth, and they all have some form of 
side effects. These vary from the highly undesirable to side effecte so severe as to 
prevent further chemotherapy. 

An underlying problem of chemotherapy is that cancer cells are not that different 
from normal undifferentfated or fast growing tissues and therefore, kffling a cancer 
cell tends to kill such cells as well. This side effect effectively limits the dose at 
which the chemotherapeutic can be applied, and hence limits the efficacy that can 
be achieved. Consequently, there is a need for the development of novel 
chemotherapeutic agents to overcome these central problems of cancer 
chemotherapy. 



The most common way to address the above problems of cancer chemotherapy is 
15 to look for compounds or delivery systems that increase the specificity for cancer 
cells. However an alternative approach Is to use compounds that protect vulnerable 
normal tissue against the proposed chemotherapeutic agent Such protectants 
should of course not be hannful to the normal cells and either not reach, or not be 
functional in the cancer cells. A numb6r of protectant approaches are In clinical use 
20 to day. 



Many chemotherapeutic agents, e.g., Doxombicin and Etoposide haye a large part 
of their toxidfy (and hence clinlcaf utility) due to the specific way in which they 
"poison" the enzyme Topolsomerase II, an enzyme with a crucial role in the 
elongation and termination stages of DNA replication. Rather than blocking the 
enzyme specifically, these drugs stabilise an intermediate DNA/enzyme/drug 
complex, creating double-stranded breaks in the DNA of treated cells, A second 
dass of structures act by blocking the Topolsomerase II catalytic cycle at other 
points in the cycle and do not create double-stranded DNA breaks. These two types 
of compounds are antagonists to each other since they stabilise different points in 
the cycle. If one binds, the other cannot Therefore, inhibitors of Topo II can be used 
to offset the effects of Topo II poisons. 
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Two highly homologous isoforms of mammalian topoisomerase II have been 
IdentiTied In tumor cells, topoisomerase II a (170 kDa) and topoisomerase II p (180 
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kIDa) {Malonne. H. and Atassi. G., AnU-Cancer Drugs, 1997, 8. 811-822). The two 
Isoforms differ in several biochemical and pharmacological properties, such as 
optimal salt concentration for in vitro catalytic activity, themial stability and sensitivity 
to tenipostde (a non-intercalative DNA topoisomerase II poison). Topolsomerase II a 
IS the major drug target isofbnn in mammalian cells (Sehested et al. Carreer 
Research. 1998. 58. 1460-1468). 

The discovery of new inhibitors of DNA topoisomerase II would enable the protec- 
tion of certain vulnerable tissues against Topo 11 poisons and hence expand the effi- 
cacy of existing chemotherapy drugs and reduce side effects. 



Desired ttieraoeutic profile! 



• DNA topoisomerase II a inhibitor Any new compound would have to be an in- 
1 5 hibitor and not a poison of the enzyme. 

• Reversible inhibition of nonnal cell growth: The effects of the doig should only 
last long enough to off set the effects of the chemotherapeutic agent 

20 • 0/a//y-acfiVe: Given Uie severity of the medical problem, an orally-active dojg 
would be desired but probably not essential. 



• Safety: The ideal candidate would have modest toxicity such Uiat it does not by 
itself place an additional toxicity burden on the patient 

25 

l^uitiole Parameter Screen: 



A multiple parameter screen set-up for cancer chemoprotectant is illustrated in Fig. 
4. In tfie assay, a producer spedes library is encapsulated so that on average each 
capsule has 1 cell and allow to grow for a few generations. These clonal lines are 
then double encapsulated witin a permeabiiised yeast that contains a human DMA 
Topo II a reporter system. The gel droplet environment contains etoposide (a 
poison) and a DNA double strand break stain. Gel droplets where the yeast cells in 
the outer layer have survived and that do not fluoresce or are stained are selected. 




4. Disease target: Diabetes (ligand activation of RXRa. ligand specific activation of 
RXR-PPARy. adipocyte differenUation). 

Type 2 diabetes is one of the most common chronic diseases and is associated with 
co-morbidities, such as obesity, hypertension, hyperffpidemia and cardiovascular 
disease. 



Peroxisome proiiferator-activated receptors (PPARs) and retinoid X receptors (RXR) 
are transcription factors belonging to the family of ligand-inducible nuclear receptors. 
There are three related but distinct PPARs called PPAR-alpha, PPAR-beta/delta 
and PPAR-gamma that form heterodimmers with RXR. These receptors regulate 
expression of genes involved in fat and carbohydrate metabolism. RXR is unique 
among retinoid receptors as It can form homo- or heterodimers with multiple nuclear 
receptors including PPARs. retinoic acid receptors (RAIRs). vitamin D receptor, and 
thyroid hormone receptor. 



PPARy/RXR regulates adipogenesis and insulin sensitivity both when activated by 
PPARy llgands and/or RXR Hgands. For example, insulin sensitizers, such as the 
20 dmgs from the thiazolidinedione class (TZDs), exert their antidiabetic effects through 
a mechanism that involves activation of the gamma isoform of the nuclear receptor 
of the peroxisome proiiferator-acUvated receptor (PPARy). 

Activation of RXRa increases activation of PPARy and insulin sensitivity. Clinical 
25 studies show that co-administration of retinoids (LG100268) +TZDs increases insu- 
lin sensith^ity and glucose uptake by 60% 

The retinoid receptors mediate the biological effects of natural and synthetic \ntamln 
A derivatives, such as retinoic acid. RXR llgands interact with many different 
30 protons, including members of the following protein families: RXR. RAR, retinoic 
acid receptor-related orphan receptor (RZR), cytoplasmic retinoic add-blnding 
proteins, retinal-binding protein. P-glycoprotein and cytochrome P450. The 
expression level of each of Uiese proteins is likely to affect the potency and efficacy 
of retinoids in various cell types. 
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Ftexinoids may have undesiraUe effects mediated by RXR homodimers or 
heterodimers partners other than PPARy. Re)dnoids for treating type 2 diat>etes 
should thus be selective for the PPAR^BXR heterodimer. 

5 Desired therapeutic orafile: 

• Se/edrVflty; RXR agonists should be selective for the PPARy/RXR heterodimer. 

• Orally-acOve: The ideal drug candidate would be orally-active. 

10 

• Safety. The ideal dnjg candidate would be devoid of significant slde^ffecte and 
dnjg interactions. 

Multiple Parameter Screen: 

15 

A multiple parameter screen set up could be the gel encapsulation of a producer 
species library reporting RXR-RXR activation with a mammalian cell line reporting 
PPARy-RXR activation as well as P450 inhibition. Gel droplets that indicate PPARy- 
RXR activation but not IRXR-RXR or P450 inhibition are selected. Figure 5 
20 exempllfles such a system. 

Absoiptlon, Distribution, Met^llsm, Excmtton & Toxicity (ADMET) 

Major reasons for the failure of lead compounds in development often involve 
25 inappropriate Idnetics or toxicity, thus there is a strong need to obtah the relevant 
infonmation as eariy as possible in the discovery process in order to spend as little 
as possible on inadequate compounds. The phannaceutical and biotech industries 
are ttius cunently focusing on transfomiing the traditionally very low Oiroughput 
processes of physicochemical, phannacokinetic and toxicity optimization studies into 
30 high Uiroughput selection metiiods in order to obtain the relevant infonnation as 
eariy in the discovery process as possible. 

Though the use of evolutionary strategies and cell based systems, the present 
Invention enables the inclusion of ADMET requirements in the lead generation 
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process and thus reduces sfgnificantly the production and screening of thousands of 
compounds ttiat are not drug like. 

SolubUity 

5 

For dnjgs to be effective they must be able to reach their targets in effective 
amounts, in cell free assays the only limitation that exists In this regard is the 
compound's solubility In the assay buffer. In cell-based assays with intracellular 
targets, the ability of compounds to diffuse across cell membranes is dependent on 

0 their ability to partition into and out of llpid-rich membranes. This process is more 
effident when compounds have a certain degree of lipophilicity in addition to being 
sufficiently water-soluble. If the cell culture medium contains proteins {such as from 
the presence of fetal calf serum) the degree of binding of the compound to serum 
proteins influences the freely diffusible fraction of compound and hence the amount 

5 available for Interaction with the target. The extent of drug binding to serum proteins 
has a number of important implications in the living organism including transport and 
distribution. 

The present invention uses a host species to produce the compounds. In a preferred 
!0 embodiment it uses assays extemal to the producer species. Thus it is an inherent 
part of the process to evaluate the ability of compounds to diffuse across cell 
membranes. The presence of medium proteins is also an inherent part of the 
system. 



Another aspect of the Invention is the control of the expression of the host's drog 
resistance pumps. This control allows significant pumping of the compounds 
produced in the first rounds of screening, when the solubility of compounds 
produced Is not a key selection criteria. In later rounds of screening the expression 
of the pumps will be progressively turned off in order for the compounds that reach 
the disease targets to have to cross the host's cell membrane and thus have a 
reasonable solubility profile. 



Absorption 
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The preferred route of drug delivery is oral administration. The intestinal membrane 
permeability is a critical characteristic that determines the extent and rate of dmg 
absorption and ultimately the bioavaHabUity. OHier cells which are relevant for dmg 
uptalce indluce epitiielial, epidenmis, nasal, blood-brain and blood-testis barriers, as 
5 well as flie kidney, liver, intestinal epitiielial and lung cells, which are also routes for 
uptake of drugs. 

Most models of absorption involve tiie use of cultured. Immortalised cells, which are 
generally intestinal in nature and which give a good correlation wiUi absorption in 

10 vivo. Most notable among tiiem are CaCO-2 cells that derive from a human colon 
carcinoma cell line or a subclone of the Caco-2 cell line, TC7. Other usefull cell lines 
for absorption studies are dog kidney cell line, the Madin-Derby Canine kidney cell 
line (MDCK) and everted intestinal rings and brush-border membrane vesicles 
(BBMV). These cell lines are grown in a confluent monolayer and used for perme- 

15 ability measurements which are based on the rate of appearance of test compound 
in the receiver compartment. The apical (donor) surface of the monolayer contains 
microvilli and thus retains many characteristics of the intestinal brush border. Fur- 
ttiemiore. tiie apically located efflux pump, P-giycoprotein, ttie monocarboxylic acid 
transporter, the dipeptide transporter, tine transporter for large neutiBi amino acids 

20 (LNAA) [Inui K-I, Yamamoto M. Sailo H. T. J Pharmacol Exp Ther, 1992; 261: 195- 
201; Lu S, Guttendorf RJ. Stewart BH. Pharm Res. 1994; 11: S-258.] and metabolic 
enzymes [Bjorge S, Halelehle KL. Homan R. Rose SE, Turiuck DA. Wright DS., 
Pharm Res, 1991; 8: 1441-1443] are all functionally expressed. 

25 Figure 6 shows an example pf a multiple parameter screen for absorption and a 
phamiacological activity. Using a dual culture system and timing ttie time of cell se- 
lection, it is possible to select producer cells tiiat have the desired pharmacological 
activify and a good absorption profile. 

30 Drugs that inhibit P-glycoprotein can alter tiie absorption, disposition and elimination 
of co-administered drugs and can enhance bioavailability or cause unwanted drug- 
drug interactions. Thus anotiier important aspect of absorption studies is to deter- 
mine if a compound is a PGP inhibitor by a direct measure of inhibition of PGP- 
mediated digoxin tinansport across polarized human PGP cDNA -expressing LLO 

35 PK1 cell monolayers. 
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In mammals, the ABC transporters, like MDR1 and MRP1. have a key role in the 
funcb'oning of the blood-brain and blood-testis barriers, as well as the kklney. liver, 
lung, and Intestinal epithelial cells. MDR1 is expressed nonnally on apical mem- 
branes of cells derived from excretory tissues, as well as on the luminal surbce of 
cerebral capillary cells (Gottesman et al.. 1993; Cordon-Canto et at.. 1989). MDR1 
and MRP1 are present In the epithelia of the choroid plexus (CP) and both trans- 
porters participate in the blood-CSF permeation banier (Rao et at., 1999). MDR1- 
Pgp contributes to the drng-pemneation banier in cerebral caplllaiy endothelial cells 
and takes part in elimination of organic cations and xenobiotics from the central 
nervous system (CNS) (Raoetal.. 1999; Schinl<el et at., 1997). MRP1 contributes to 
the basolateral broad-speclfity dnjg-pemieation barrier in CP, protects this epithe- 
lium from xenobiotics and extaides organic anions and probably also some hydro- 
phobic compounds from the CSF (Wijntiolds et al., 2000). Some ABC transporters 
fonn and regulate specific membrane channels, while others are involved In the 
elimination of detoxified drug-conjugates, transport of phospholipids or bile acids, 
and even the initiation of antiviral immune-reaction or spedfic self-destruction in 
various ceil types. Moreover, members of tiie ABC transporter family were shown to 
provkle multidrug resistance in pathogenic bacteria and parasites (e.g. Plasmodium 
and Leishmania species), while also aDovwng multixenoblotic resistance (MXR) in a 
large variety of organisms Rving in a chemically polluted environment (Kuretec et at.. 
1989; 1992). 

In order to predict the penetration of a compound through different phannacologlcal 
barriers, a wide range of ABC transporters-compound Interactions are also being 
tested. e.g.. Pgp/IWDRI, IWIRPI, MRP2, MDR3, MRP3, MRP5, IVIRP6. MXR (BCRP. 
ABCG2). 

h4etabolism 

A drug, once it enters an organism, can experience a variety of biological fates. 
Drug metabolizing enzymes, including cytochromes P450, (present at high levels in 
liver, kidney, gut and other organs), can catalyze the chemical conversion of a 
particular drug to entities (metabolites) which are more aqueous-soluble and more 
readily excreted than the parent drug from which tiiey were derived, if a parent drug 
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is inherently metabolically unstable, undesirable phanmacokinetic behavior, such as 
an inappropriately short duration of action or poor oral bioavailability, can be 
observed. It Is therefore, common practice in the industry to gain Icnowledge about 
the metabolic stability of lead candidates In order to identify compounds that may 
5 tum out to have poor pharmacoldnetic profiles. 

In addition, studies in drug metabolism can address the issue of possible drug-drug 
interactions, which are closely linlced to the safe use of drugs in polytherapies. Most 
undesirable drug-drug interactions occur when two or more compounds compete for 
10 the same drug-metabolizing enzyme. The result is usually altered pharmacokinetics 
for one or more of the compounds involved, sometimes accounting for compound 
blood levels which are outside of the therapeutic window. These types of Interac- 
tions can be foreseen with the assistance of studies of the inhibitory effects of test 
compounds with specific drug metabolizing enzymes. 

15 

Various in vitro methods are available which are being increasingly incorporated into 
drug discovery strategies. Among the most popular and widely utilized systems in 
use today are hepatic microsomes. These preparations retain activity of those en- 
zymes ttiat reside in the smooth endoplasmic reticulum, sudi as cytochromes P450 

20 (CYP), flavin monooxygenases (FMOs), sulfotransferases, UDP-glycosyl transfer- 
ases, glutathione transferases and N-acetyl transferases, isolated hepatocytes ap- 
pear to retain a broader spectrum of enzymatic activities, including not only reticular 
systems, but cytosolic and mitochondrial enzymes as well. Liver slices, which like 
hepatoQTtes retain a. wide anay of enzyme activities, are also increasingly used. 

25 Furthermore, both hepatocytes and liver slices are capable of assessing of enzyme 
induction in vitro. Isolated heterologous human CYP enzymes have been available 
for several years, being expressed from cDNA in yeast (Saccharomyces cerevislae). 
bacterial (Eschericliia coU), and mammalian (B-lymphoblastoid) cell lines [Ohgiya S, 
Komori M, Fujitani T, Miura T. Shinriki N, Kamataki T., Biocfiem Int. 1989; 18: 429- 

30 438; Winters DK, Cederi^aum Al, Biochim Bioptiys Acta 1992; 1156: 43^9; Crespi 
CL. Gonzalez FJ, Steimel DT, Turner TR, Gelboin HV, Penman BW, Langenbach R, 
Qhem Res Toxicol. 1991; 4: 566-572]. These systems have been used to ascertain 
whetiier a compound is a subsb-ate for a particular CYP isozyme and. if so, what 
metabolite Is generated by that enzyme. 

35 
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Assays using recombinant human cytochromes P450, (Including CYP2D6 
&CYP2C19 that are polymorphically-encoded) as well as assays using isozyme- 
specific substrate and metabolite combinations In liver microsomal preparations can 
provide valuable infomnation regarding a test compound's drug-drug interaction po- 
5 tential. . 

In the present invention the generation of small molecules is carried out by host cells 
that can themselves be transfonned with a range of enzymes involved in human 
metabolism. These minimises the number of false positives generated by com- 
0 pounds that are rapidly metabolised by the human DMEs and also leads to the dis- 
covery of compounds that are active after being metabolised and which would oth- 
enA^lse remain undiscovered, (see figure 7). 

In another aspect of the Invention the dmg metabolising enzymes are included ex- 
15 traceiiular to the small molecule producer cell in either cell free or cell based assays. 
WhOT using cell based assays, one prefen-ed approach is the use of hepatocytes- 

In yet another aspect of the invention, the drug metabolising enzyme(s) are associ- 
ated with reporter systems in order to gain information on enzyme Inhibition. Alter- 
!0 natively competiHon assays for drug-drug interactions can be carried out. 

In specific cases, some of the drug metabolising enzymes are themselves the dis- 
ease targets since several of these enzymes are known to be associated with sev- 
eral diseases. 

15 

Conceptually, the drug metabolizing enzymes are divided into two groups. Oxidative 
dnjg metabolizing enzymes, which include CYP450s and FMOs, catalyze the intiD- 
duclion of an oxygen atom into substrate molecules, generally resulting in hydroxy- 
lation or demethylation. The conjugative enzyme families include the UDP- 
\0 glycosyltransferases (UGTs), glutathione transferases (GSTs), sulfotransferases 
(SULTs). and N-acetyltransferases (NATs). The conjugative drug metabolizing en- 
zymes catalyze the coupling of endogenous small molecules to xenobiotics that 
usually results in the formation of soluble compounds that are more readily excreted. 
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Cvtot^mme P4S0s 

Cytochrome P450 proteins in liumans are drug metabolizing enzymes and en^mes 
that are used to make cholesterol, steroids and other important lipids such as 
5 prostacyclins and thromboxane A2. These last two are metabolites of arachldonic 
acid, ly/lutations in cytochrome P450 genes or defidencies of the enzymes are re- 
sponsible for several human diseases. Induction of some P450s is a risk factor in 
several cancers since these enzymes can convert procarcinogens to cardnogens. 

10 CYP450 enzymes In the liver catalyze the initial step in the biotransformation of xe- 
nobiotto compounds, induding most dmgs. These enzymes are members of a large 
family of mixed-function oxidases ttet catalyze the Introduction of an oxygen atom 
into substrate molecules, often resulting in hydroxylated or deaOcyiated metabolites. 
The metaboTism takes place in two phases. Phase I is chemical modification to add 

15 a functional group that can be used to attach a conjugate. The conjugate makes the 
modified compound more water soluble so it can be excreted In the inlne. Many 
P4508 add a hydroxy! group In a Phase I step of drug metabolism. The hydroxyl 
then serves as the site for further mod'rficattons in Phase 2 drug metabolism. 

20 More than fifty CYP450 isozymes are known to exist in humans and they have been 
dassified into 18 families and 43 subfamilies based on amino add sequence simi- 
larities. Proteins from the same family are greater than 40% identical at the amino 
acid level, while those in the same subfamily are greater than 55% identical (Nelson, 
D.R. (1999) Arch. Biochem. Biophys. 369:1-10). In the standard nomendature, the 

25 family is designated by a number followed by a letter designation for the subfamily, 
and a second number ttiat Identifies the individual member of that subfamily. 

CYP1 drug metabolism (3 subfamilies. 3 genes. 1 pseudogene) 
CYP2 drug and steroid metabolism (13 subfamilies, 16 genes. 16 pseudogenes) 
30 CYP3 drug metaboPism (1 subfamily. 4 genes, 2 pseudogenes) 

CYP4 arachldonic add or fatly acid metabdism (5 subfamilies, 11 genes, 10 pseu- 
dogenes) 

CYP5 Thromboxane A2 synthase (1 subfarra'ly, 1 gene) 

CYP7A bile add biosynthesis 7-alpha hydroxylase of steroid nudeus (1 subfamily 
35 member) 

CYP7B brain specific fonn of 7-alpha hydroxylase (1 subfamily member) 




CYP8A prostacyclin synthase (1 subfamily meml>er) 
CYP8B bile add biosynthesis (1 subfamily member) 
CYP11 steroid biosynthesis (2 subtemilies. 3 genes) 
CYP17 steroid biosynthesis (1 subfamily, 1 gene) 17-alpha hydroxylase 
5 CYP19 steroid biosynthesis (1 subfamily. 1 gene) aromatase fonns estrogen 
CYP20 Unknown function (1 subfamily, 1 gene) 
CYP21 steroid biosynthesis (1 subtemily. 1 gene. 1 pseudogene) 
CYP24 vitamin D degradation (1 subfamily, 1 gene) 

CYP26A retinoic acid hydroxylase important in development (1 subfamily member) 
10 CYP26B probable retinoic acid hydroxylase (1 subfamily member) 
CYP26C probabvie retinoic acid hydroxylase (1 subfamily member) 
CYP27A bile acid biosynthesis (1 subfamily member) 

CYP27B Vitamin D3 1-alpha hydroxylase activates vitamin D3 (1 subfamily 
member) 

15 CYP27C Unknown function (1 subfamily member) 
CYP39. unknown function (1 subfamily member) 
CYP46 cholesterol 24-hydroxylase (1 subfamily member) 

CYP51* cholesterol biosynthesis (1 subfamily, 1 gene, 3 pseudogenes) lanosterol 
14-alpha demethylase 

20 

The bulk of drugs are metabolised by a few members of the CYP1 . 2, and 3 families 
and the metabolism occurs primarily in the liver, which contains the highest 
concentration of CYP450 in the body. However, the Importance of exlrahepatlc 
metabolism in tissues such as the Intestine and lung Is also recognized. 

25 

The xenobiotic metabolizing P450s are approximately 50 kDa proteins anchored In 
the endoplasmic reticulum (ER) by a single transmembrane helix in the N-terminus. 
Cell fractionation using differential centrifugation results in particulate preparations 
enriched in endoplasmic reticulum, commonly referred to as microsomes. Detailed 

30 examination of microsomal fractions from many different individuals has demon- 
strated significant variability in expression patterns of individual isozymes, however 
some generalizations are possible (Guengerich, P.P. (1995) Cytochrome P450: 
Structure, Mechanism, and Biochemistry (Second Edition), Chapter 14, edited by 
Paul R. Ortiz de Montellano, Plenum Press. New York. Shimada, T„ et aL (1994) J. 

35 Pharmacol. Exp. Then 270:414-23). On average. 70% of the P450s expressed in 
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adult human liver consist of the following isozymes: 1 A2, 2A6, 2B6, the 2C subfam- 
ily (2C8. 2C9, 2C18. and 2C19). 2D8, 2E1 . and the 3A subfamily {3A4 and 3A5). 

Another very Important aspect of the P450s is that polymorphiOTis cause significant 
differences In drug metabolism from population to population and individuo to 
individuo. A polymorphism is a difference in DMA sequence found at 1% or higher in 
a population. These differences in'DNA sequence can lead to differences in drug 
metabolism, so they are important features of P450 genes, in humans. CYP2C19 
has a polymorphism that changes the enzyme's ability to metabolize mephenytoin (a 
marker drug). In Caucasians, the polymorphism for the poor metabollzer phenotype ^ 
is only seen in 3% of the population. However, it Is seen in 20% of the asian 
population. Because of this difference, it is important to be aware of a person's race 
when drugs are given that are metabolized differently by different populations. Some 
drugs that have a narrow range of effective dose before they become toxic might be 
overdosed in a poor metabolizer. A cytochrome P450 allele website is available from 
Sweden at http://wwwJmm.ki.se/CYPalleles/ 

Another aspect of the current invention is the ability to evolve drugs designed for 
specific populations or even individuos since the drug metabolic aspects are ad- 
dresses during the drug generation process. 

Oxidation of organic molecules by P450s Is quite complex (Ortiz de Montellano, P.R. 
(1995) Cytochrome P450: Stmcture. Mechanism, and Biochemistry (Second Edi- 
tion). Chapter 8, edited by Paul R. Ortiz de Montellano, Plenum Press, New York), 
but the overall reaction can be represented simply by Equation 1: 

Equation 1: RH -i- 02 4- NADPH * H+ ->ROH + H20 -i- NADP+ 

An electron from NADPH is transferred via the flavin domain of NADPH-P450 re- 
ductase to the heme domain of the CYP450 where the activation of molecular oxy- 
gen occurs. Substrates react with one of the oxygen atoms and the other is reduced 
to water. In some cases, the second electron can come from NADPH via cyto- 
chrome bS reductase and cytochrome b5. During in vitro reconstitution experiments, 
cytochrome b5 can stimulate metabolism of various substrates by some CYP450 
isozymes, notably 3/V4, 2E1, and 2C9. However, the mechanism of this stimulation 
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is not dearly understood. Apocytochrome 65 was shown to be as effective as tfie 
holoenzyme in stimulating reconstituted CYP3M reactions, so at least in this in- 
stance, It does not appear to be playing a direct role In electron transfer (Yamazaki, 
H., ef a/. (1996) J. Biol. Chem. 271:27438-44). The most widely held hypothesis is 
5 that cytochrome bS acts allosterically to enhance the interaction between CYP450 
and NADPH-P450 reductase, or it improves substrate binding. 

Flavin Monooxygenases (FMOs) 

10 Flavin monooxygenases. like the CYP450 enzymes, are associated with the endo- 
plasmic reticulum and catalyze the oxidation of organic compounds using molecular 
oxygen and NADPH as the source of electrons for the reduction of one of the oxy- 
gen atoms (Equation 1). However, they are mechanistically distinct from the 
CYP450S in that they react with oxygen and NADPH in the absence of substrate to 

15 form a 4a-hydroperoxy flavin enzyme intermediate. Thus, the FMOs exist in an acti- 
vated form in the cell, and their interaction with a nucleophilic group such as an 
amine, thiol, or phosphate, is all that is required for completion of the catalytic cycle 
(Rettie, A.E, and Rsher. M.B. (1999) in Handbook of Drug Metabolism, pp131-147, 
edited by Thomas F. Woolf, Marcel Dekker, Inc, New York). The capacity to remain 

20 stable while poised in an activated state is a possible explanation for the extremely 
broad substrate specificity of the FMO isozymes. It has been proposed that essen- 
tially alt of the energy required for catalysis is captured in the oxygen-activated in- 
termediate, and that alignment or distortion of the substrate molecules Is not re- 
quired (Ziegler. D.M. (1993) Annu. Rev. Pharmacol. Toxicol. 33:179). It follows that 

25 the active site of FMOs is much less sterically defined than for other enzymes. 
FM03 is the most'abundant fomi in human liver and is believed to be the dominant 
member of this enzyme family in temis of overall dmg metabolism (Rettie, A.E. and 
Rsher, M.B. (1999) in Handbook of Dmg Metabolism, pp131-147, edited by Thomas 
F. Woolf, Marcel Dekker, Inc, New York). 

30 

UDP glycosyltransferases (UGTs) 

UDP glycosyltransferases catalyze the glucuronidation of xenobiotlcs at hydroxyl, 
carboxyl, amino, imino, and sulfyhydryl groups using UDP-glucuronic acid as a do- 
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nor molecule (Equation 2). in general, tliis generates products that are more tiydro- 
philic and thus more readily excreted In bile or urine. 

Equation 2: UDP-glucuronlc add -i- R ->UDP •*■ R-glucuronide 

5 

Although glucuronldation generally Is dassified as Phase li metat)olism - the phase 
occuning after CYP450 dependent oxidative metat)ollsm - many compounds do not 
require prior oxidation because they already possess functional groups that can be 
glucuronidated. Examples of first-pass metabolism catalyzed by UGTs include the 

10 UGT2B7- dependent glucuronldation of morphine (Coffman, B.. et at. (1996) Drug 
Metab. Dispos, 25:1-4) and the glucuronidation of 5-lipoxygenase inhibitors (anti- 
inflammatories) (Coffman. B., et at. (1997) Drug Metab. Dispos. 25:1032-8); in the 
latter case, glucuronidation was demonstrated to be the rate-limiting step for in vivo 
plasma clearance. UGTs are 50-60 kDa integral membrane proteins with the major 

15 portion of the protein. Including the catalytic domain, located in the lumen of the en- 
doplasmic reticulum and a C-tenninal anchoring region of 15-20 amino adds span- 
ning the ER membrane (Radomlnska-Pandya. A., et al. (1999) Drug Metab. Rev. 
31:817-99.11. Ftedominska- Pandya. A., ©fa/. (1999) Drugl\4etab. Rev. 31:817-99). 
The aglycone-binding site is believed to be in the N-tenmlnal portion the UGT poly- 

20 peptide, which is the region of the protein that shows the greatest variability in se- 
quence among UGT teo^es. The UDPGA binding doma'm Is In the highly con- 
served C-terminaA half of the protein. Although not a certainty, it has been hypothe- 
sized that assodation with Gpld is required for UGT activity and may Influence the 
access of aglycones to the active site. Two UGT lilies - UGT1 and UGT2 - have 

25 been identified in humans. Althoigh members of these families are. less than 50% 
identical in primary amino add sequence, they exhibit significant overlap In sub- 
strate specificity (Radomlnska- Pandya, A., et al. (1999) Drug Metab. Rev. 31:817- 
99). The members of the UGT1 family that are expressed in human liver, where the 
majority of xenoblotic metabolism takes place, indudes UGT1A1. 1A3, 1A4. 1A6. 

30 and 1A9. Although the UGT2 family has not been as extensively studied, it is known 
that UGT2B4. 2B7. 2B10. 2B11 and 2B15 are expressed in the liver (Radominska- 
Pandya, A., etai (1999) Drug Metab. Rev. 31:817-99.11. Radomlnska- Pandya. A, 
et al. (1999) Drug Metab. Rev. 31:817-99). As is the case for other drug metaboliz- 
ing enzymes such as CYF450s. inter-indlvidual differences in UGT expression lev- 
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els have been observed and linked to diflerenoes In dnig responses (Weber. W. 
(1997) Pharmacogenetics. Oxford University Press, New York). 

The human UGT1 family Includes the major bilirubin metabolizing isofomt (UGT1A1) 
5 and the isoform that preferentially conjugates planar phenols (UGT1A6). Isofonns in 
the UGT2 family metabolize a variety of endogenous steroid compounds, as well, as 
xenobiotics. As with the CYP450s, classification of the UGTs based on substrate 
speclfidty is somewhat limited since there is a great deal of overlap in the biotrans- 
fomnation capacity for most of the human UGTs. 

10 

GlutatNone transfersises (GSTs) 

Glutathione transferases catalyze the fonnation of thioether conjugates between 
glutathione (GSH) and reactive xenobiotics by direct addition (Equation 3) or dis- 
1 5 placement of an electron-withdrawing group (Equation 4). 

Equation 3: GSH -i- R -»GS-R 
Equation 4: GSH I- R-X -»GS-R + HX 

20 The major biological function of GSTs is believed to provide defense against elec- 
trophilic chemical species. The majority of GSTs are c^soflc homodlmers com- 
posed of approximately 25 kDa subunits from one of four structural classes: Alpha 
(a), Mu (p). PI (ii), and Theta (6). The <n isoform (GST A1-1) Is restricted to a few 
tissues in mammals, including kidney, intestine, lung and liver. The p isoform (GST 

25 Ml -1 ) is found in the liver, but relatively few other tissues. In contrast, the ^ isofonn 
(GST PI -1 ) is widely distributed throughout the body, although it Is notably absent In 
the liver. Additionally, GST P1-1 is abundant in most types of tumor cells. 

Sutfbtmnsferases (SULTs) 

30 

Sulfotransferase enzymes catalyze the conjugation of sulfate groups onto a variety 
of xenobiotic and endogenous substrates that possess acceptor moieties such as 
hydroxy! and amine groups (Equation 5). 



35 



Equation 5: R-XH * PAPS -»R-S04 + phosphoadenosine + H+ 
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The cofactor 3'-phosphoadenosine 5*-phosphosulfate (PAPS) is required, for sul- 
fonation by these enzymes. Although sulfonafion generally causes molecules to lose 
their biological activity, several documented examples indicate that the addition of 

5 sulfate can lead to formation of highly reactive metabolic intermediates, such as . 
minoxidil, and reactive electrophiltc cations, such as sulfated N-hydroxy 2- 
acetylaminofluorene (McCall, J., ef a/. (1983) J. Med, Chem, 26:1791-3; Miller, J A 
(1994) Chem. Bio, Interact 92:329^1). Several sulfotransferase enzymes with dif- 
ferent biochemical properties have been characterized In animal and human tissue. 

10 Two general classes exist In tissue fractions: the cytosolic enzymes, which are con- 
sidered important in doig metabolism; and the membrane bound enzymes, which 
are involved in the sulfonation of glycosaminoglycans and glycoproteins (Weinshil- 
bourn. R.M.. ef aA(1997) FASEB J. 11:3-14). The human cytosolic sulfotransferase 
Isozymes funrtion as homodimers of 32-35 kDa subunits. There are currently 10 

15 known sulfotransferases in humans, five of which are known to be expressed in 
adult liver (SULT1A1. SULT1A2, SULT1A3, SULT1E and SULT2A1). It is expected 
that other new genes encoding sulfotransferases will be identWied. The nomencla- 
ture of the different genes, their mRNA and protein products has recently been re- 
vised so that -SULT" Is the accepted superfamily abbreviation (Raftogianis, R.B., ef 

20 aL (1 997) BBRC 239:298- 304). Allelic variants of sulfotransferase enzymes do exist 
and studying their frequency and functional role in drug disposition is a very active 
area of research. 

N-aceM Tmnsfemses 

25 

N-acetyltransferases (NATs) catalyze the biotransformation of aromatic amines or 
hydrazines to the respective amides and hydrazides (Equation 6) using acetyl co- 
enzyme A as a donor. They also will catalyze the O-acetylation of N- 
hydroxyaromatic amines to acetoxy esters (Equation 7). 

30 

Equation 6: R-NH2 + CoA— S— C0CH3 -^R-NCOCH3 + CoA— SH 
Equation 7: R— NHOH + CoA— S— COCH3 -»R— NHOCOCH3 + CoAr-SH 

There are two known NAT isoforms in humans called NAT1 and NAT2; both are 33 
35 kDa cytosolic proteins found in the liver. NAT1 is also expressed In many other tis- 
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sues, whereas NAT2 is expressed only in the liver and gut The two isofbrms have 
different, but overlapping substrate spedficities. with no single substrate appearing 
to be exdusiveiy aoetylated by one isoform or the other. Genetic polymorphisms for. 
N-acetylation are well documented, and may play a role in the susceptibilify of cer- 
5 tain Individuals to bladder and colon cancer, as the NATs are involved in both the 
activation and detoxification of heterocyclic aromatic amine cardnogens (Weber. W. 
(1997) Phaonaoogenetics. Oxford University Press. New York). 

Toxicity 

10 

One of the main fbniis of toxicity is hepatotoxyclty. Freshly isolated human hepato- 
cytes represent the best In vitro biological system In which to evaluate toxicity. Some 
human Hver cell lines have been developed that reflect normal human liver metabo- 
lism (e.g.. ACTIVTox from Amphioxus Inc. and Hep G2 from Cer^). These ceil lines 
15 can be used In cell proliferation assays that give very good oonelatim with In vivo 
results. 

In the present invention toxicity assessment is an inherent parameter of the screens 
since the compounds are produced in a host organism. Any compound that is very 
20 toxic will not be selected or detected since it will kill the host organism. A more accu- 
rate human toxicity assay can be incorporated in the multiple param^er screening 
procedure by for example encapsulating hepatocytes with the producer species and 
disease target(s) and select screening units that activated the disease terget(s) in 
the desired way and have not inhibit hepatocyte growth. 

25 

Figure 8 shows a schematic representation of a screening system of the present 
invention to evaluate terget activity, metabolism by DMEs and cytotoxicity: Using a 
double gel encapsulation system where In the first droplet are clonal lines of the 
producer species transformed with the phannacoiogical target and DMEs. and in the 
30 second droplet aiie hepatocytes, it is possible to screen for target activity, DME me- 
tabolism and hapatotoxi'city. 

Mutagenesis 



35 



The mutagenic ability of a compound is another aspect that has to be addressed in 
a dmg discovery programme. The mutagenidfy of a compound can be evaluated by 
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measuring the r^rse-mutation rate In microorganisms. For example, there are 
several different strains with differing and complementaiy sensitivities to potential 
mutagens of the bacterium SalmoneHa typNmurium. 

MULTIPLE PARAMETER SCREENING FOR OTHER PURPOSES 
Screening for herbicides. 

The effect of a compound as a herbldde can be screened with in vitro assays. 
Primary screens that test the effect as a herbicide include: toxicity, inhibition of 
photosynthesis, inhibition of central metaboHc enzymes. 

Examples of further screens that can be assayed simultaneously with the first group 
include: uptake (using hairy root cultures, organ cultures (Including shoot cultures), 
metabolism, lack of toxicity towards other plants (in particular crops) or towards 
other organisms (animals, humans, insects, fungi). 

Screening for fungicides (agricultural) 

Primary screens are nriuch like the ones used in screening for or evolving herbicides 
except that fungal cells are used as reporter cells and for uptake. 

Secondary screens are also more or less of the same type. One particular screen to 
perform is lack of to)dcily towards plants, in particular crop plants. 

Screening for insecticides 

Primary screens Include the assays for tiie function of tiie compounds as 
Insecticides, i.e. cell based assay for toxicity towards a specific species or group of 
spedes of insects, and/or assays for inhibition of specific enzymes In key metabolic 
functions of insects, or inhibition of reproduction. 

Secondary screens may include uptake in specific insect organs using e.g. a 
confluent monolayer of insect cells from the organ in which tiie insectidde is to be 
taken up. A furtiier screen includes metabolism by Insect metabolic enzymes to test 
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whether the coinpounds are metabolised or activated by these. Furthermore, it is 
relevant to screen for lack of toxicity or mutagenidty or teratogenicity towards 
animals and/or human beings. Another example of a secondary screen is lack of 
toxicity towards other species of insects. 

Screening for cosmetics 



Primary screens are directed to the function of the compounds as cosmetics. 

10 Secondary screens include- the same as for screening for or evolution of 
pharamaceuticals, i.e. absorption (if relevant), distribution (if relevant), metabolism, 
excretion (if relevant) and toxicity, mutagenicity and teratogenicity. 

Screening for flavours 

15 

Primary screens may include automatic assaying for the desired flavour. "Artificial 
noses" have been developed that can assay for particular flavours or tastes. Artificial 
noses, or olfactory or vapor-selective detectors can detect low levels of odorants. 
Examples of such noses are disclosed in e.g. US 6.368,558 and references dted 
20 therein. The technique is also know as artificial olfactometry. 

Secondary screens typically include toxicity, mutagenicity, teratogenicity, 
metabolism (by e.g. saliva enzymes). 

25 Rne chemicals: other examples of multiple parameter screening and evolution 
include the evolution and screening for fine chemicals, food and feed additives, and 
catalysts. 



SCREENING TECHNIQUES 

30 

The selection of Vne positive cells can be achieved by establishing screens where 
only positive cells sunwe or by phyisically selecting positive cells. Survival of 
positive clones can e.g. be achieved by using assays based on 
a. Survival in the presence of toxic substances 
35 b. Survival In the presence of other organisms 




c. 



Use of nutritional reporter genes, e.g.. His, or of reporter genes that 
when giving desired response produce a vital protein, e.g., CDC25 

Physical selection of positive cells can be done by the use of: 
5 a. FACS & intracellular reporter assays (native or engineered) 

b. FACS & Gel encapsulation (single, double or more) & extracellular 
reporter systems [cell based (native or engineered) or cell free] 

c. Overlay assay & extracellular reporter systems [cell based (native or 
engineered) or cell free] & picking (manual or automatic) 

10 Single clonal cell line confinement to mlcrotiter plate & extracellular 

reporter systems [cell bas^ (native or engineered) or cell free] & 
picking (manual or automatic) 
e. Plating & picking (manual or automatic) 

15 Flow cytometry 

In traditional flow cytometry, it is common to analyze very large numbers of cells In a 
short period of time. Newly developed flow cytometers can analyze and sort up to 
100,000 cells per second. In a typical flow cytometer, individual particles pass 

20 through an illumination zone and appropriate detectors, gated electronically, 
measure ttie magnitude of a pulse representing ttie extent of light scattered. The 
magnitude of these pulses are sorted electronically into "bins" or "channels", 
permitiang ttie display of histograms of tiie number of cells possessing a certain 
quantitative property versus tiie channel number. It was recognized earty on that the 

-25 data accruing from flow cytometric measurements could be analyzed (electronically) 
rapidly enough that elechronic cell-sorting procedures could be used to sort cells witii 
desired properties into separate "buckets", a procedure usually known as 
fluorescence-activated cell sorting. 

30 Fluorescence-activated ceil sorting has been primarily used In studies of human and 
animal cell lines and tiie control of cell culture processes. Fluorophore labeling of 
cells and measurement of the fluorescence can give quantitative data about spedfic 
target molecules. or subcellular components and flieir distn"bution in the cell 
population. Row cytometry can quantitate virtually any cell-associated property or 

35 cell organelle for which there is a fluorescent probe (or natural fluorescence). 
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Cell sorters can handle cell sorting at rates of at least 10.000 cells per second, more 
preferably at least 50,000 per second, more preferatrfy at least 100,000 per second. 

5 Gel microdroplet encapsulation 

The gel microdroplet technology has had significance in amplifying the signals 
available in flow cytometric analysis, and in permitting the screening of microbial 
strains in strain improvement programs for biotechnology. Wittrup et al., 
0 (Biotechnolo. Bioeng. (1993) 42:351-356) developed a microencapsulation selection 
method which allows the rapid and quantitative screening of >10^ yeast cells for 
enhanced secretion of Aspergillus awamori glucoamylase. The method provides a 
400-fold single-pass enrichment for high-secretion mutants. 

5 Gel microdroplet or other related technologies can be used in the present invention 
to localize as wdl as amplify signals in the high throughput screening of ceils. 
Preferably the screening methods of the present invention are laid out to ensure 
survival of the producer cell, so that these can l>e used for further rounds of 
evolution. However, it is also possible to isolate the expression cassettes and 

!0 possibly even the artificial chromosomes finom the cells and re-insert these into other 
host cells should this be necessary If the cells are Idlled by the screen. 

Different types of encapsulation strategies and compounds or polymere can be used 
with the present invention. An encapsulation of particular relevance is the 
:5 encapsulation in calcium alginate due to its broad applicability. Furthermore, calcium 
alginate beads can be made at room temperature and be dissolved by gentle 
procedures leaving the encapsulated cells alive. 

A further feature of particular interest is the possibility of coating the beads (or gel 
10 microdroplets) with a lipid layer in order to malce them impenneable to small 
molecules. This ensures that small molecules do not leak to the surroundings and 
that the connection between producer cell and small molecule is not lost during 
screening and sorting of gel microdroplets. 
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Encapsulation techniques may be employed to localize signal, even in cases where 
cells are no longer viable. Gel microdrops (GMDs) are smaU (25 to 200 \im in 
diameter) particles made with a biocompatible matrix. In cases of viable cells, these 
microdrops serve as miniaturized petri dishes because cell progeny are retained 
5 next to each other, allowing isolation of cells based on clonal growtti. The basic 
meUiod has a significant degree of automation and high ttiroughput. Cells are 
encapsulated togetiier wtth substrates and particles containing a positive clones are 
sorted. Fluorescent substrate labeled glass beads can also be loaded inside the 
GMDs. In cases of non-viable cells. GMDs can be employed to ensure localization 
10 of signal. 

Encapsulation can be in beads, high temperature agaroses, gel micrxsdroplets made 
from agarose, polysacchoride, caritjohydrate. alginate, can^geenan. chitosan. 
cellulose, pectin, dextran, or polyaciylamide, cells, such as ghost red blood cells or 
15 macrophages, liposomes, or any other means of encapsulating and localizing 
molecules. 

Gel encapsulated ceHs may further be enclosed In a layer essentially non-penetrable 
to tiie compounds being screened. Thereby it is ensured that ttie compounds remain 
20 within flie vicinity of the cell and ttiat the physical connection between cell and 
compound is not lost Furthermore lealcage from gel droplet to gel droplet is 
prevented. The non-penetrable material may a lipid material. 

Advantageously, cells are encapsulated into gel droplets comprisirig reporter 
25 system(s) prior to sorting by FACS so ttiat the advantages of FACS are combined 
with ttie advantages of gel droplet screening. 

The cells and tiie reporter system(s) may be encapsulated into one layer of gel 
droplets. The cell may also be encapsulated in one layer of the gel droplet and at 
30 least one reporter system Is encapsulated in another layer of the same gel droplet. 
Furttiennore. the cell may be encapsulated in one layer of the gel droplet and a first 
reporter system is encapsulated in anoUier layer of the same gel droplet and at least 
a second reporter system is encapsulated Into yet anoUier layer of Vne same gel 
droplet. 

35 



P 669 DKOO 



63 



For example, methods of preparing liposomes have been described (i.e., U.S. Pat 
Nos. 5.653,996. 5,393,530 and 5.651.981). as well as the use of liposomes to 
encapsulate a variety of molecules U.S. Pat. Nos. 5,595,756, 5,605.703. 5.627.159, 
5,652,225. 5,567,433. 4,235.871. 5,227,170). Entrapment of proteins, viruses, 
bacteria and DMA in erythrocytes during endoc^osis has been described, as well 
(Journal of Applied Biochemistry 4. 418-435 (1982)). Erythrocytes employed as 
earners in vitro or in vivo for substances entrapped during hypo-osmotic lysis or 
dielectric brealcdown of the membrane have also been described (reviewed In Ihler. 
G. M. (1983) J. Phann. Ther). These techniques are useful in the present Invention 
to encapsulate samples for screening. 

An environment suitable for facilitating molecular interactions include, for example, 
liposomes. Liposomes can be prepared from a variety of lipids including 
phospholipids, glycolipids. steroids, long-chain alley! esters; e.g., alkyi phosphates, 
fatty acid esters; e.g., lecithin, fatty amines and the like. A mixture of fatty material 
may be employed such a combination of neutral steroid, a charge amphiphile and a 
phospholipid. Illustrative examples of phospholipids include lecithin, sphingomyelin 
and dipalmitoylphos-phatldylchoiine. Representative steroids include cholesterol, 
cholestanol and lanosteroL Representative charged amphiphillc compounds 
generally contain from 12-30 carbon atoms. Mono- or dialkyi phosphate esters, or 
alkyI amines; e.g., dicetyl phosphate, stearyl amine, hexadecyl amine, dilauryl 
phosphate, and the like. 

Other screening systems 

As an alternative to gel droplet so-eening the selection of positive cells meeting the 
at least one screening criterion may be performed by means of an overiay assay, 
said overiay assay comprising reporter system(s), and manual or automatic picking 
of positive cells. 

Other systems for screening include the selection of positive cells meeting the at 
least one screening criterion is performed by means of placing a single clonal cell 
line in one well of a microtiterplate, said well comprising reporter system(s), and 
manual or automatic pidcing of positive cells. This system takes advantage of the 
many systems developed for automatic handling and analysis of microtiterplates. 
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Cells may also simply be plated on medium and positive cells can be picked either 
automatically of manually. 

.Cells may also be engineered so that only positive cells are able to sunnve. These 
cells may be grown In liquid media or be plated. 

Evolution towards mutliple parameters 

Evolution at its most general is a process, whereby a set of replicating and varying 
patterns are subjected to a selection process that favours the replication of certain of 
the variant patterns. The selection process acts on an emergent proper^ 
(phenotype) that is encoded by the pattern and that varies as a consequence of the 
underlying variation in the pattern. Over the course of a series of replication events 
those patterns whose replication is most favoured come to dominate the population. 

Variation in the patterns occurs as the result of changes In individual patterns or as 
the result of mixing of individual pattems. Which patterns come to dominate the 
population is partly a consequence of the selection criteria used and partly a 
function of the starting population. 

In living organisms and cells the predominant replicating pattern consists of 
nucleotide sequences (DNA or - in some vira - RNA) and Hie criteria on which 
selection acts it typically mediated through otiier molecules such as (but not limited 
to) proteins, metal>olites, and structural macromolecules that are encoded by the 
nucleotide sequence either directly or indirecUy. 

In genetic algorithms the replicating pattern consists of software defined magnetic 
states and the variation on which selection acts is typically (but not limited to) ihe 
solution of a mathematical algorithm encoded by the magnetic states eiUier directiy 
or indirecUy. 

The ability of a pattern to replicate in a given set of environmental parameters is 
often refenred to as the Titness" of the pattern. Fitness can be regarded as a 
maUiematical property that replicating pattems "attempt to" optimise. The higher the 




fitness of any given pattern, the greater the chance it win produce one or more 
copies of itself, the higher the number of copies it vM on average produce, and the 
lower the chance it will be destroyed prior to replication. As with any mathematical 
function the property that is optimised may itself be a complex function of otherwise 
5 independent properties. Thus evolution can optimise across more than one crltoia. 
For instance the mating calls of many male insects are optimised to attract females 
of the same species whilst not attracting predators. The oxygen binding proteins In 
whale blood are optimised to bind oxygen under one set of conditions and release It 
under another set of conditions. 

10 

Cells containing genetic material are thus in principle able to evolve by virtue of the 
variations in the genetic sequence that occur wnthin each cell and the consquen<»s 
of this variation upon the fitness of the ceil in a given set of environmental 
parameters and the ability of the cell to pass these genetic sequences on to 
15 descendant cells 

For the purposes of this invention the term "Fitness Function" shall be taken to mean 
a mathemetical or algebraic equation that calculates a score and virtiere the variable 
elements in the equation are output variables that vary between different cells within 
20 a ce//popu/af/on. 

For the purposes of this invention the term "Rtness Score" shall be the score gener- 
ated by tiie fftness funcbon equation. 

25 It shall be understood Oiat any selection process conducted on cells may therefore 
be conducted according to the following general procalure: 

• The fitness function (F') is defined so ttiat It encapsulates the desired phenotype 
of ttie cell and mattiematically relates this to measurable parameters 

• Each cell or group of cells is measured on one or more parameters 
30 • F for Oie cell Is calculated according to the measured parameters 

• those cells vwtii the highest F scores are removed from the screening locality 
and allowed to grow. Cells wth lower F scores are discarded. By Uie highest F' 
score is meant a predetennined percentage of the cells wiUi the highest score, 
such as ttie best 1%, 5%, 10 % or 60%, or for very high selection pressures the 




best 1%o, the best 0.1 %o, the best 0.01 V the best 0,001 %o, or the best 
0.0d01%o. 



It is an Important teaching of evolution that the criteria on which certain patterns are 
5 selected over other patterns is essentially arbitrary - in principle any criterion can be 
used. That arii>itrary, human imposed criteria can be used to generate an 
evoluUonary process In a whole organism is exemplified by the evolution of 
melanism in moths as a result of Industrialisation, the evolution of pedigree dogs 
with various properties and the evolution of plants with e.g. enhanced levels of 

10 commercially valuable oils or more even fruiting times or more attractive scents and 
colours. The term "breeding" is often used to describe human imposed evolution. 
Such organisms have Increased their fitness acconJing to a given set of human 
imposed criteria. It shall be obvious from the these examples that it is not necessary 
for the fitness function equation to be explicitly described for the evolution to take 

15 place. 



It is a further teaching that fitness functions and consequent selection pressures can 
lead to the organism expressing phenotypes that impose high costs on (and even in 
some cases kill) the organism. All that is required for this to be the case is that they 
confer a counten^iling benefit that allows the underiying pattem that produces the 
phenotype to spread. One example is the evolution of the peacock's tail, which 
whilst making it highly visible and vulnerable to competitors and predators, improves 
its ability to attract mates and hence replicate. In organisms with diploid or higher 
ploidy and with sexual reproduction It is even possible for patterns that have a net 
cost to be maintained In the population at reasonable levels. One example of this is 
the maintenance of the sickle cell anaemia mutation' In west afrlcan human 
populations. The heterozygote form of the mutation confers a benefit (by making the 
earner more resistant to malaria) whilst the homozygote is costly (causing severe 
anaemia). The positive benefit of the heterozygote results in the underlying pattem 
being maintained in the population at a relatively high frequency. 



35 



It is a further teaching that multiple selection pressures, acting on a population at 
different locations and times help develop and maintain the variability of replicating 
patterns in the population. 
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It is a further teaching that if two identical selection pressures are applied to two 
Independent but apparently Identical populations then although such populations wQI 
each evolve similar phenotypes the genetic patterns that come to dominate the 
populaton (and that confer the evolved phenotype) may differ between the 
5 populations. An example of different genetic patterns confemng the same 
phenotype is streptomycin resistance in bacteria. 

From the above it should be clear that organisms are capable of complex 
evolutionary responses to a wide range of en>riix}nmental pressures. 

10 

The evolution according to the present Invention is based on a series or cycle of 
steps of subjecting a composition of cells to screening and selecting cells exhibiting 
a predetermined functionality, as shown in Fig. 22. The cycles are repeated until the 
desired functionality, for example a target specificity and activity is obtained. Another 
15 example of general screening strategy is illustrated in Rgure 21 . 

In other words, the method of evolution according to the present invention is based 
on the provision of 

1 . a suitable set of diverse genetic patterns and also 
20 2. a way of selecting for those genetic patterns within this set that encode for 

phenofypes that are consistent with these properties and also 
3. a way of generating novel genetic patterns from those patterns that were 
selected In step 2. 



These steps may then be combined sequentially or in parallel or in some other 
essentially Iterative basis. The present Invention sets out how to achieve these 
requirements. 



In another aspect of the invention, the methods may be applied to the generation of 
a pathway derived from sources from multiple natural kingdoms, phyla or orders in 
the host cell. An example of this wquld be the generation of a pathway to produce 
retinoids or other molecules by means of introduction of genes encoding for the 
production of careotehoid pathways (obtained from fungi, algae and/or plants) as 
well as genes encoding for the synthesis of Vitamin A (obtained from mammals) or 
genes encoding for the production of visual pigments (obtained from insects). By 




such targeted selection and combination of elements of biochemical pathways 
across Icingdoms or phyla the lilcellhood of obtaining novel metabolites may be 
further increased. 

As previously described a fitness function (F*) can be defined that encapsulates the 
5 desired phenotype of the cell and mathematically relates this to one or more 
measured outputs. For example the fitness function may be defined as the multiple 
of a cell's absorption at two different wavelengths or alternatively It may be defined 
as the level of inhibition of one enzyme, divided by the Inhibition of another enzyme, 
or it may be defined as the level of cytotoxic poison that a cell can survive, multiplied 
10 by the rate or reproduction of the cell in the absence of the cytotoxic or it may be 
defined in numerous other ways 

In each screening round cells are selected that have outputs that con-espond to one 
or more elements of the fitness function. In a prefenred embodiment early screening 
15 rounds only measure one output vWiilst later screening rounds measure multiple 
outputs. 

Those cells with the highest fitness scores In the population are removed from the 
screening environment for later use and/or analysis. Cells with lower F' scores may 

20 be discarded. By the highest F' score can be meant a predetermined percentage 
with the highest score, sudi as the best 1%. 5%, 10 % or 50%, or for very Intense 
selection or very large cell populations the best 1%d. the best 0.1 %o. the best 0.01 
%», the best 0.001 V or the best 0.0001%. Alternatively an absolute fitness score 
can be defined and only those cells that exceed this score are selected. By this 

25 approach the percentage of celts that are selected may vary. 

In a prefen-ed embodiment of this invention the screening and selection processes 
should be conducted on a repetitive or iterative basis, with each iteration being 
conducted on a daughter population. 

30 

For each iteration of the screening step, the fitness score that the cells are 
categorised upon is defined and the cell population subjected to screening. Over a 
series of iterations the fitness score is elaborated such that it progressively 
approaches the desired target value. The fitness score may be elaborated either by 
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being increased or by having additional factors added into the equation that derives 
the fitness score. 

The screening criteria are hence progressively optimised towards the desired 
5 funrtionality through the necessary rounds or cycles of screening and selection. The 
steps are repeated until at least one cell having the desired functionality has been 
evolved, such as repeated at least twice, such as at least three times, such as at 
least four times, such as at least five times, such as at least ten times, such as at 
least twenty times, such as least rifly times, such as at least one hundred times, 
1 0 such as at least two hundred times 

In another embodiment the steps are repeated until at least two cell lines, or at least 
five cell lines, or at least 10 cell lines, having the desired functionality have been 
evolved. In a prefen-ed embodiment at least a part of the cell lines evolved have 
15 different genetic patterns or genotypes, in a more preferred embodiment all the cell 
lines evolved have different genetic patterns or genotypes. By the tenn cell lines is 
meant ceils originating firom cells having met the screening criteria related to the 
determined screening functionality. 

20 The screening criteria (or threshold) for one or more outputs may be increased for 
each repeat Increasing criteria may for example be increasing concentration of a 
chemical, such as a toxin, in growth media for each repeat, or decreasing 
concentration of one or more nutrition components in the growth media or 
decreasing sensitivity or proximity of a reporter constaicL Other examples of 

25 Increasing criteria may be repetitive changes of temperature, either Increasing or 
decreasing depending on the cell type chbseriV 

The screening criteria may also change character per repeat, such as starting with a 
concentration of a chemical substance in the growtti media, and adding a physical 
30 parameter, such as light, in ttie next repeat, or starting witii measuring the activity 
against one enzyme and adding activity against anottier enzyme in tiie next repeat 

It is also wittiin the scope of the present invention tiiat screening criteria may be a 
mixture of the criteria discussed above, ie. increased concentration of a chemical 
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combined with changes of physical parameters, and/or increased concentration of 
one chemical combined vnth changed concentration of another. 

Through this approach and in accordance with the general principles of evolution. 
5 over a series of screening and selection cycles host lines that most demonstrate the 
required characteristics are selected for and come to dominate the population. Over 
a series of screens the required fitness score is raised or elat>orated, favouring 
those combinations that have led to an improvement in the expression of the desired 
characteristics. 

10 

In one embodiment the host cell lines that are a priori believed to be interesting for a 
given target are selected and the selected lines evolved through a series of screens 
as set out in Figure 22. 

15 In another embodiment the approach is one of an escalator of selection pressure 
using screens that move from the general / low activity to the specific / high activity 
with the generation of new genetic patterns between each step. 

In another embodiment the fitness score is deliberately raised only marginally 
20 between selection cycles, such as by no more than 50% or by no more than 25% or 
by no more than 10% or by no more than 5% or by no more than 1%, Such 
gradualist selection pressures allow low level responses to be built upon over a 
series of selection cycles. By selecting marginal improvements in the fitness score 
such an approach maximises the genetic diversity at each stage in the selection 
25 process. 

Specific strategies for pathway generation 

In another embodiment the approach is to wallc down a specific multi-step 
30 metabolite pathway in a manner analogous to playing a slot machine. Once the first 
step of the pathway is obtained the genetic material for that step is put on "hold" by 
increasing its relative abundance such that most cells in the cell population contain 
said genetic material and the other genetic materials are then varied (spun or 
permed) until tiie second step is achieved, which is then also put on "hold". This 
35 process is repeated until the entire pathway is obtained. 
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In another embodiment the approach is to reverse up a specific multi-step 
metabolite pathway. Once the last step of the pathway is obtained the genetic 
material for that step is put on "hold" by Increasing its relative abundance such that 
5 most cells in the cell population contain said genetic material and the other genetic 
materials are then varied (spun or penned) until the next but last step Is achieved, 
which is then also put on "hold". This process is repeated until the entire pathway is 
obtained. 

10 Also, a combination of botii embodiments may be conducted, so ttiat the pattiway is 
built up from "both ends". 

In one embodiment of ttie invention the cells are subjected to the selection criteria 
under conditions that maximise the number of genes expressed by the cells, 
15 Including ti^e genes being heterologous to \he cells. Alternatively ttie cells are 
subjected to the selection criteria under conditions that ensure a certain percentage 
or set of the genes being heterologous to Oie cells are expressed 

It should be understood that the above approaches are general in concept and lend 
20 themselves to the construction of many variants, depending on the desired goal. 

Furttiermore, it should be understood that by using a cell-based system an 
advantage is that Uie compounds may be selected also on parameters not being 
included in the fitness function, In tiiat the system inherenUy promotes evolution of 
25 compounds exhibiting properties such as not being toxic to the cell, as well as 
compounds that diffuse rapidly wiUiin tiie cell. 



30 



35 



Examples of ttie approaches to build known or structural class focused pattiways 
are as follows: 

For small to medium sized patinways. i.e. pathways of up to 6-7 steps from metabo- 
lites of ttie host cell, Uie screening strategy relies on enriching ttie founder popula- 
tion wlUi relevant genes and on ttie reasonably high probability of assembling over a 
series of selection rounds pathways ttiat produce a low level of the desired property. 




For large pathways (I.e. more than 6-7 steps) the screening strategy mostly involves 
dividing the pathway Into subsets and a) defining screening parameters for each 
subset In order to build a pathway fonrards or b) identifying intermediate metabolites 
that are feed to the cell population in order to assemble the pathway backwards. 

For example in the case of retinoid like compounds it is well known that carotenoids 
are metabolised by specific tissues in specific classes of organisms to produce reti- 
noids. It is thus possible to first evolve a population of cells that produce carotenoids 
and then mix the genes of this population(s) with those of a populatlon(s) enriched 
for retinoid genes and in this manner evolve a population that produce retinoid like 
compounds. 



Another example is the case of Taxol like compounds, for which the exact biosyn- 
thetic pathway is not known but is predicted to be somewhere between 12 and 20 

15 enzymatic steps from yeast metabolites and several of the intermediate compounds 
have been isolated. It is thus possible to start by feeding a metabolite that is a few 
steps from Taxol In order to identify a population of cells able to pnxluce Taxol like 
• compounds from this precursor. Once this is achieved, the genes responsible for 
tiiat small pathway are locked, e.g., Integrated In Oie host's genome, or Incorporated 

20 in artificial chromosomes at such high levels that statistically Hiey occur In most cells 
and a second evolution process is started. This time ttie precursor being fed to ttie 
cell population is an earlier metabolite from tiie Taxol biosynthesis. By repeating this 
partial evolutions a number of times, it is possible to evolve a population of cells fliat 
produce Taxol like compounds starting with host metabolites. 

25 

Finally it should be said it is also possible to produce a dass of compounds using a 
combination of both approaches described, i.e.. by starting simultaneous evolution 
processes that cover the pathway backwards and forwards. 

30 Diverse Genetic Patterns 

Given that evolution is a statistical process it is necessary to provide sufficient 
genetic variation on which selection processes can act. In the present invention, this 
comprises two elements 
35 • - Providing a sufficiently large and diverse population 
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• Controlling the genetic t>asis of the diversity and how it expresses 

Selection requires genetic diversity on \which to operate. Thus the first requirement 
of the current invention is to provide a population of cells that embodies a genetic 
5 diversity. The term "genetic diversit/' means that substantially all cells are different, 
in that they comprise different genes, and/or identical genes under control of 
different control system, such as different promoters, such that almost each cell 
initially represents a genotype not represented in any of the other cells. Of course 
due to cell division a few cells may be substantially identical. 

10 

The tenm "Cell Population" shall be taken to mean a population of cells where at 
least lO"* cells, such as at least 10® cells, such as at least 10® cells, such as at least 
10^ cells, such as at least 10® cells, such as at least 10^ cells, such as at least 10^° 
cells, such as at least 10" cells, such as at least 10^^ cells in the population 
15 represent a genotype not represented In any of the other cells. 

Thus, the principle of the evolution method according to the invention is to obtain a 
population, of cells having a very high genetic diversity. 

20 One particular embodiment of this principle is to produce cells with combinations of 
concatemers comprising cassettes with expressible nucleotide sequences from a 
number of different expression states, which may be from any number of unrelated 
or distantly or closely related spedes, or from species from different kingdoms or 
pfrylae. novel and random combinations of gene products are produced in one 

25 single cell. 

By inserting novel genes into the host cell, and especially by inserting a high number 
of novel genes from different expression states, such as from a wide variety of 
spedes Into a host cell, the gene products from this array of novel genes will interact 
with the pool of metaboRtes of the host cell and with each other and modify known 

30 metabolites and/pr intermediates in novel ways to create novel compounds. Due to 
the high number of substantially different cells that can be generated using the 
methods according to the present invention, for example at least lO"* cells, such as 
* at least. 10? cells, such as at least 10® cells, such as at least 10^ cells, such as at 
least 10®, such as at least 10^, for example at least 10^°, such as at least 10^^ it is 

35 more or less inevitable or at least likely that such large populations will lead to a 
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sub-population having sucli an interaction. The sub-population having such 
interaction may comprise at most 10^° cells, such as at most 10^ cells, such as at 
most 10* , such as at most 10^ cells, such as at most lOP cells, such as at most 10® 
cells, §uch as at most 10^ cells, such as at most 10^ ceils, such as at most 10^ cells 
5 orjust 10 cells. 

HOST CELLS 

The host cells selected for this purpose are preferably cultivable under standard 
• 10 laboratory conditions using standard culture conditions, such as standard media and 
protocols. Preferably the host cells comprise a substantially stable cell line, in which 
the concatemers can be maintained for generations of cell division in a suitable 
manner. It is also of great advantage that standard techniques for transformation of 
the host cells are available, especially that methods are Icnown for insertion of 
1 5 artificial chromosomes into the host cells. 

It is also of advantage if the host cells are capable of undergoing meiosis to perform 
sexual recombination. It is also advantageous that meiosis Is oontrotlable through 
extemal manipulations of the cell culture. One especially advantageous host cell 
20 type is one where the cells can be manipulated through extemal manipulations Into 
different mating types. 

The host cell should preferably be conditionally deficient In the abilities to undergo 
homologous recombination. The host cell should preferably have a codon usage 
25 similar to that of the donor organisms. Furthermore, in the case of heterologous 
genomic DNA, if eukaryotic donor organisms are used. It is preferable that the host 
cell has the ability to process the donor messenger RNA property , e.g., splice out 
introns. 

30 The cells can be bacterial, archaebacteria. or eukaryotic and can constitute a 
homogeneous cell line or mixed culture. Suitable cells include the bacterial and 
eukaryotic cell lines commonly used In genetic engineering and protein expression. 
Suitable mammalian cells include those from, e.g., mouse, rat, hamster, primate, 
and human, both cell lines and primary cultures. 

35 
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Preferred prokaryotic host organisms may include but are not limited to Escherichia 
coli, Bacillus subtilis, B licehniformis, B. cereus, Streptomyces llvldans, 
Streptomyoes coelicolor, Pseudomonas aeruginosa. Myxococcus xanthus. 
Rhodococcus, Streptomycetes. Actinomycetes, Cbrynebacteria, Badllus; 
5 Pseudomonas, Salmonella, and Erwinia. The complete genome sequences of E. 
coll and Bacillus subtilis are described by Blattner et aL. Science 277, 1454-1462 
(1997); Kunst et al.. Nature 390. 249-256 (1997)). 

Preferred eukaryotlc host organisms are mammals, fish, insects, plants, algae and 
10 fungi. 

Examples of mammalian cells include those from, e.g., monkey, mouse, rat, 
hamster, primate, and human, both cell lines and primary cultures. Preferred 
mammalian host cells include but are not limited to those derived from humans. 
IS monkeys and rodents, such as Chinese hamster ovary (CHO) cells, NIH/3T3, COS, 
293, VERO, HeLa etc (see Kriegler M. in "Gene Transfer and Expression: A 
Laboratory Manual", New Yoric, Freeman & Co. 1990), and stem cells, including 
embryonic stem cells and hemopoietic stem cells, zygotes, fibroblasts, lymphocytes, 
kidney, liver, muscle, and skin cells. 

20 

Examples of Insect cells include baculo lepidoptera. 

Examples of plant cells indude maize, rice, wheat, cotton, soybean, and sugarcane. 
Plant cells such as those derived from Nicotiana and Arabidopsis are preferred 

25 

Examples of fungi include penicilllum, aspergillus, such as Aspergillus nidulans. 
podospora, neurospora, such as Neurospora crassa, saccharomyces. such as 
Saccharomyces cerevlsiae (budding yeast), Schizosaccharomyces, such as 
Schizosaccharomyces pombe (fission yeast), PIchia spp, such as Pichia pastoris. 
30 and Hansenula polymorpha (methylotropic yeasts). 

The choice of host will depend on a number of factors, depending on the intended 
. use of the engineered host, including pathogenicity, substrate range, environmental 
hardiness, presence of key intermediates, ease of genetic manipulation, and 
35 likelihood of promiscuous transfer of genetic information to other organisms. 
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Partlculariy advantageous hosts are E. coll, lactobadiii, Streptomycetes, 
AcUnomycetes and filamentous fungi. 

A preferred host ceil is yeast due to the foHowing characteristics: it is fast growing, 
5 eukaryotic, allows scalable culture capabilities, genetic tools are available, it is me- 
tabollcally flexible, can have a relatively penneable cell membrane/Vvall and folds 
more heterologous eukaryotic proteins correctly than prokaryotic cells. 

Thus, an illustrative and not limlHng list of suitable yeast host oeHs comprise: baker's 
10 yeast, Wuyveromyces marxianus, K. lacOs. Candida utilis. Phaffia rhodozyma, 
Sacchare>myces boulardii, Pichia pastoris. Hansenula polymorpha, Yan-owia 
tipolytica. Candida paraffinica. Schwanniomyces castellii. Pichia stipitis. Candida 
shehatae, Rhodotomla glutlnis, Upomyces lipofer, Cryptoooccos curvatus. Candida 
spp. (e.g. C. palmioleophila). Yannowia lipolytica, Candida guilllennondii. Candida. 
15 Rhodotoaila spp.. Saccharomycopsis spp., Aureobasidium pullulans, Candida 
brumptii, Candida hydrocarbofumarica, Tomlopsis, Candida tropicalis. 
Saccharomyces cerevislae. Rhodotorula rubra, Candida fiaveri. Eremothecium 
ashbyii, Pichia spp., PUMa pastoris, Schizosaccharomyces pompe (fission yeast). 
Kluyveromyces. Hansenula, Kloeckera, Pichia, Pachysolen spp., or Tomtopsis 
20 bombicola. 

In any one host cell It is possible to make all sorts of combinations of expressible 
nucleoOde sequences from all possible sources. Furthenmore. it is possible to make 
combinaOons of pronnoters and/or spacere and/or introns and/or terminators in 
25 combination with one and the same expressible nucleotkje sequence. 

In a prefen^d embodiment the cells to be evolved are produced by Inserting 
concatemers comprising the multitude of cassettes into a host cell, in whteh the 
concatemers can be maintained and the expressible nucleotide sequences can be 
30 expressed in a co-ordinated way. The cassettes comprised in the concatemers may 
be cut out from the host cell and re-assembled due to their uniform structure with - 
preferably - compatible restriction sites between the cassettes. 

The cells as defined In the present invention are preferably collected into 
35 • populattons for use In the presoit Invention. The composition of cells subjected to 



P669DK00 



77 



evolution is then produced by selecting cells from a population or from several sub- 
populations. A peculation of individual cells is a population of expression constructs 
prepared from randomly assembled or even concatenated ^ressible nucleotide 
sequences derived- frmn a plurality of species of donor organisms, in which 
5 expressible nucleotide sequences are operably assodated with regulatory regions 
that drives expression of the expressible nucleotide sequences in an appropriate 
host ceil. The host cells used are capable of producing functional gene products of 
the donor organisms. Upon repression in the host ceil, gene products of the donor 
organism(s) may interact to fonn novel biochemical pathways. 

10 

The population according to this embodiment of the invention may in any one cell 
comprise a unique and preferably random combination of a high number of 
expression cassettes being heterologous to the host cells. Through this random 
combination of expression cassettes novel and unique combinations of gene 
15 ■ produ<^ are obtained in each cell. Such populations are espedaily adapted in the 
discovery of novel metabolic pathways created through the non-native combinations 
of gene products. 

in a preferred embodiment a population may be defined as a population comprising 
20 a collection of individual ceils, the cells being denoted 
celli. celi2, celli. wherein I ^ 2, 

ea<*i ceil comprising at least one concatwner of individual oligonucleotide 
cassettes, each concatemer comprising a nucleotide sequence of the following 
formula: 

25 [rsz-SP-PR-X-TR^SP-rsilp 

wherein rsi and rsz together denote d restriction site. SP denotes a spacer of at 
least two bases, X denotes an ^pressible nucleotide sequence, PR denotes a 
promoter, capable of regulating the expression of X in the cell, TR denotes a 
temiinator, and n ^ 2, and 

30 wherein at least one poncatemer of celli is different from a concatemer of cell2. 

In the preseiit context the nucleotide sequence of the formula [rsg-SP-PR-X-TR-SP- 
rsi]n is also referred to as an expression cassette of the formula [re2-SP-PR-X-TR- 
SP-rsi]„. 
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Sub-populations may comprise cells as defined above for populations, but mostly 
the cells of a sub-population will have at least one trait In common, such as common 
promoter combinations, genetic material from a common species, a ccmimon 
phenotype or the like. 

The function of the populations and sub^^opulations is to act as a source of diversity 
when obtaining the composition of cells to be evolved. Thus, in one embodiment the 
composition is a collection of suboomposttions. wherein a subcompbsition is a 
collection of individual cells having at least one phenotype in common. In a prefen-ed 
embodiment the composition comprises at least 2 individual subcompositions, said 
subcompositions being different, such as at least 5 individual sub-compositions, 
such as at least 10 individual sub-compositions, wherein each sub-composition 
comprises at least 10 individual cells, such as at least 50 Individual cells, such as at 
least 100 individual cells, such as at least 10' individual cells, such as at least 10* 
Individual cells, such as at least lO' individual ceils, such as at least 10^ individual 
cells, such as at least 10' Ind'nndual cells, such as at least 10^ individual cells, such 
as at least 10° individual ceBs. 

The composition of cells preferably comprises at least 20 individual cells, such as at 
least 50 individual cells, such as at least 100 individual cells, such as at least 150 
individual cells, such as at least 200 individual cells, such as at least 250 indi\ndual 
cells, such as at least 500 individual cells, such as at least 750 individual cells, such 
as at least 1000 bndivldual cells, such as at least 10* indi>ddual cells, such as at least 
10* individual cells, such as at least 10* individual cells, such as at least 10' 
individual cells, such as at least 10" individual c^ls. such as at least 10" individual 
cells. 

In a preferred embodiment at least a majority of the individual cells have a genetic 
patterns or genotypes, thereby representing a great diversity. 

The temi bounding population" or a "founder populations" shall mean a Cell 
Population that has not itself been subjected to a selection round, in the present 
context also referred to as composition of cells. Optionally the expression constructs 
within the cell population are constmcted such that genetic material from species 
that are known from prior ait to produce compounds of a desired structure class, or 




compounds that have a desired functional effect or are associated with a desired 
functional effect Independent of knowledge of the compounds, predominate. 

The term "daughter population" is a cell population having been subjected to at least 
one selection round. In the present context the daughter population is also refeaed 
to as a further modified composition. 



Controlling The Genetic Basis of the Diversity 



10 Sources of Genes 

The natural world contains a significant amount of genetic diversity. Various 
authorities estimate that there are at least 10^ different species, and that each of 
these species contains on average at least 10"^ genes. Even allowing for the fact that 
15 many of these genes are relatively conserved between species this represents a 
high level of genetic diversity. 

One approach that can be envisaged for the purposes of the cunrent invention is to 
source genetic material so as to maximise the taxonomic diversity of the genes 
20 obtained. 



A second Is to preferentially source genetic material from organisms ttet are known 
or reputed to produce molecules of the structural dass or with the functional effects 
desired or are known or reputed to have a desired functional effect without the 
25 molecule being known, or-are taxonomically related to any such organism. 

A third approach is selection of genes of particular interest 

A fourth approach is to select genes that generally extend the host metabolic 
30 pathways. 

Optionally these approaches can be combined in any suitable manner. 



35 



Genes can be sourced through the collection and processing of genetic material of 
various fomis. The expressible nucleotide sequences that can be inserted into the 
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vectors, concatemers, and ceUs according to this invention encompass any type of 
nucleotide such as RiMA, DNA. Such a nucleotide sequence could be obtained e.g. 
from cDIMA. which by its nature is expressible. But it is also possible to use 
sequences of genomic DNA, coding for specific genes. Preferably, the expressible 
5 nucleotide sequences conrespond to full length genes sudi as substantially full 
length cDNA, but nucleotide sequences coding for shorter peptides than the original 
full length clones may also be used. Shorter peptides may still retain the cateilytic 
activity of the native proteins. Thus, a preferred embodiment of this invention is to 
source and collect messenger transcripts (mRNA) for obtaining cDNA. 

10 

Another way to obtain expressible nucleotide sequences is through chemical 
synthesis of nucleotide sequences coding for l<nown peptide or protein sequences. 
Thus the expressible DNA sequences does not have to be a naturally occurring 
sequence, although it may be preferable for practical purposes to primarily use 
15 naturally occurring nucleotide sequences. Whether the DNA is single or double 
stranded will depend on the vector system used. 

By tt\e term "Expression state* is meant a state of gene expression (I.e the mRNA 
transcript popuilation) In a specific cell, tissue, combination of tissues or organism or 

20 organisms of a given species as sampled at at any one time. Different expression 
stetes are found in different individuals, or in the same individual at different point in 
time, or in the same individual at different points its life-cycle or In the same 
individual under differing external conditions. The expression states of given cells or 
tissues of a given individual will also vary with respect to ottier cells or tissues of ttie 

25 same individual. Different expression states may also be obtained in the same organ 
or tissue in any one species or individual by exposing the tissues or organs to 
different environmental conditions comprising but not limited to changes in 
developmental stage, age. disease, infection, drought, humidity, salinity, exposure to 
xenobiotics, physiological effectors, temperature, pressure, pH. light, gaseous 

30 environment, chemicals such as toxins. 

In ttie following the invention is described in the order in which the steps of obtaining 
a transformed host cell containing an evolvable a'rtifidal. chromosome may be 
performed, starting witti the entry vector. 

35 
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In most cases the orientation with respect to the promoter off an expressible 
nucleotide sequence will be such that the coding strand is transcribed into a proper 
mRNA. It Is however conceivable that the sequence may be reversed generating an 
antisense transcript in order to block expression of a specific gene. 

Each. cell of the cell population Is initially produced by combining genes selected 
from at least one expression state. It is of course also possible from the onset to 
combine genes from two, three, four or more expression states in one host cell or to 
combine genes from different organisms in one cell. In some embodiments of the 
invention it is preferred to combine genes from a large variety of organisms into a 
single host in a manner so that each cell comprises at least two expressible 
nucleotide sequences, said sequences being heterologous to the cell, i.e. the 
sequences are not found in ttie native cell type. 

A wide variety of combinations of expressible nucleotide sequences from all 
possible sources may occur in Uie cells. Furttiermore, it is possible to make 
combinations of promoters and/or spacers and/or introns and/or terminators in 
combination with one and the same expressible nucleotide sequence. 

Thus in any one cell there may preferably be expressible nucleotide sequences from 
two different expression states. Furthennore. these two different expression states 
may be from one species or advantageously from two different species. Any one 
host cell may also comprise expressible nucleotide sequences from at least three 
species* such as from at least four, five, six. seven, eight, nine or ten species, or 
from more than 15 species such as from more than 20 species, for example from 
more than 30, 40 or 50 spedes, such as from more Oian 100 different species, for 
example from more than 300 different species, such as from more than 500 different 
species, for example from more than 750 different species, thereby obtaining 
combinations of large numbers of expressible nucleotide sequences from a large 
number of species. In this way potentially unlimited numbers of combinations of 
expressible nucleotide sequences can be combined across different expression 
states. These different expression states may represent at least two different 
tissues, such as at least two organs, such as at least two species, such as at least 
two genera. The different species may be from at least two different phylae. such as 
from at least two different classes, such as from at least two different divisions, more 



preferably from at least two different sub-kingdoms, such as from at least two 
different Icingdoms. Thus expressible nucleotide sequences may be combined from 
a eukaryote and a prolcaryote into one and the same cell. 

5 According to another embodiment of the invention, the expressible nucleotide 
sequences may be from one and the same expression state. The products of these 
sequences may interact with the products of the genes in the host cell and with each 
other and form new enzyme combinations leading to novel biochemical pathways. 

1 0 Sources of genetic diversity 

Examples of groups of spedes and individual species known to produce compounds 
with structural or functional utility include without limitation 

Streptomyces. Mlcromonospora, Norcadia. Actinomadura. Acllnoplanes. 
Streptosporanglum. Microbtspora. Kitasatosporiam, Azobacterium, Rhizoblum. 
Ac^romobacteHum. Enterobacterium, Brucella, Microooccus, Lactobadtlus, Bacfllus 
(B.t toxins), Clostridium (toxins). Brevibacterium, Pseudomonas, Aerobacter, Vibrio. 
Halobacterium, Mycoplasma, Cytophaga, Myxococcus 

Amanita muscaria (fly agaric, ibotenic add. musdmol), Psilocybe (psHocybin) 
Physarium. Fuligo, Mucor, Phytophtora. Riiizopus, Aspergillus. Penlctllium 
(penidinn), Coprinus. Phanerochaete. Acremonium (Cephalosporin), Trochodemia. 
Helmlnthosporium. Fusarium, Altemaria. Myrothedum. Saccharomyces 

Digenea simplex (kalnlc add, antihelminthic), Laminaria anqustata (laminine. 
hypotensive) 

Usnea fesdata (vuipinlcadd. antimicrobial; usnic add. antitumor) 

Artemisia (artemisinin), Coleus (forsf^onn), Desmodium (K channel agonist). 
Catharanthus (Vinca aUcaloids). Digitafis (candiac glycosides). Podophyllum 
(podophynotO)dn). Taxus (taxol). Cephalotaxus (homoharringtonine). Camptotheca 
(Camplothecin). Camelf^ sinensis (Tea), Cannabis indica. Cannabis sat^a (Hemp). 
Erythroxyfum coca (Coca). Lophophora williamsli (PeyoteMyristica fragrans 
(Nutmeg). Nlcotiana, Papaver somniferum (Opium Poppy), Phalaris arundinacea 
(Reed canary grass) 
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15 Bacteria 



20 

Fungi 



25 

Algae 



Lichens 

30 

Higher Plants 



35 



Protozoa 

40 



Ptychodiscus brevis; Dinoflageliates (brevitoxin, cardiovascular) 
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10 



15 



20 



25 



30 



Coelentarata 
Corals 

Aschelminths 

Molluscs 

Annelida 

Arachnids 

Crustacea 

Insects 

Spinuncufida 

Bryozoans 

Echinodemris 

Tunlcates 

Vertebrates 



35 



Mlcrodona profifeia (ectyonln. antimicrobial) Cryptotelhya cryta (D-aiabino 
furanosides) 

Portuguese Man o War & other jellyfish and medusoid toxins. 

Pseudoterogonia species (Pseudoteradns. antWnllammalory). Erythropodium 
(eiythrolides, anti-inflammatory) 

Nematode secretory compounds 

Conus toxins, sea slug to>dns, cephalapod neurotransmitters, squM inlcs 

Lumbriooner^ heteropa (nereistoxin, insectlcidal) 

Dolomedes (Ashing spider venoms) 

Xenobalanus (skin adheres) 

Epilachna (mexican bean beetle alicaloids) 

BoneIHa virldts (bone01n.neunoactive) 

Bugula nerf dna (bryostatlns.anli cancer) 

Crinold chemistry 

Trididemmim sOfidum (didemnln,anti-lumor and antl-viral: Ectelnasddia turtrinata 
ecteinascidins. anti-tumor) 

Eplatretus stoutii {eplatrelin.cardioaclive), TracWnus draco (protelnaceous toxins, 
reduce blood pressure, respiration and reduce heart rate). Dendrobatid frogs 
(batrachotoxins. pumlllotoxins, histrionlcotoxins, and other pofyamlnes); Snalce 
venom toxins; Orinthorhynohus anatinus (duck-bined pfalypus venom), modified 
carotenoids. retinoids and steroids; Avians: histrionicotoxins. modiHed carotenokis, 
retindds and sterokis 



40 



Controlling Gene Expression - Expression Cassettes 

Genes primarily give rise to selectable phenotypes through transcription of the gene 
to RNA and translation of the FUMA to protein. Furthemiore phenotypes are often the 
result of interactions between multiple genes and their gene products 
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Thus it is an element of the current invention that the heterologous genes are 
provided in a format whereby their individual and collective expression (transcription 
to RNA) can be controlled. 

5 It is likely that through the combination of a high number of non-native genes in a 
host cell combinations of genes or single genes are inserted that are lethal or sub- 
lethal to the host cell Through the co-ordinated expression of the genes in the host 
cell it is possible not only to initiate the expression of any subset of genes but also to 
repress such expression* e.g. of lethal or sub-lethal genes. 

10 

Through external regulation of the promoters controlling the expressible nucleotides 
sequences novel and non-naturally occurring combinations of expressed genes can 
be obtained. Since these novel and non-natural combinations of gene products are 
found in one and the same cell, the heterologous gene products may affect the 

15 metabolism of the host cell in novel ways and thus cause it to produce novel primary 
or secondary metabolites and/or known metabolites In novel amounts and/or known 
metabolites in novel compartments of the cell or outside the cells. The novel 
metabolic pathways and/or novel or modified metabolites may be obtained without 
substantially recombining ttie Introduced genes with a segment in tine host genome 

20 or an episome of the host cells by as well as without intra- or extra concatemeric 
recombination. 

By ha>ring expressible nucleotide sequences under tfie control of a number of 
independently inducible' or repressibie promoters, a large number of different 
25 expression states can be created inside one single cell by selectively tuming on and 
off groups of the inserted expressible nucleotide sequences. The number of 
independently inducible and/or repressibie promoters in one cell may yary from 1 to 
10, such as 2, 3. 4, 5, 6, 7, 8, or 9. or even up to 15. 20, 25 or above 50 promoters. 

30 In the evolution steps the functionality of the controllable promoters of the cells is 
used, since due to the controllable promoters it is possible during the screening and 
selection step to switch promoters on and oft. Oiereby creating a greater diversity of 
expressed genes. 
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The term promoter is used with its normal meaning, i.e. a DNA sequence to which 
RNA polymerase binds and initiates transcription. The promoter detennfnes the 
polarity of the transcript by specifying whidh strand will t>e transcribed. 

• Bacterial promoters normally consist of -35 and -10 (relative to the 
5 transcriptional start) consensus sequences which are bound by a spedfic 

Sigma factor and RNA polymerase. 

• Eukaryotic promoters are more complex. Most promoters utilized In 
expression vectors are transcribed by RNA polymerase II. General 
transcription Actors (GTFs) first bind specific sequences near the 

10 transcriptional start and then recruit the binding of RNA polymerase IK In 



addition to these minimal promoter elements, small sequence elements are 
recognized specifically by modular DNA-binding / trans-activating proteins 
(e.g. AP-1, SP-1) which regulate the activity of a given promoter. 
Viral promoters may serve the same function as bacterial and eukaryptic 
promoters. Upon viral infection of their host, viral promoters direct 
transcription either by using host transcriptional machinery or by supplying 
virally encoded enzymes to substitute part of the host machinery. Viral 
promoters are recognised by the transcriptional machinery of a large number 
of host organisms and are therefore often used In cloning and expression 



Promoters may furthermore comprise regulatory elements, which are DNA 
sequence elemente which act In conjunction with promoters and bind either 
repressors (e.g., lacO/ LAC Iq repressor system in E. coll) or inducers (e.g.. gall 

25 /GAL4 inducer system in yeast). In either case, transcription is virtually "shut ofP 
until the promoter is derepressed or induced, at which point transcription is "tumed- 
on". The choice of promoter in the cassette is primarily dependent on the host 
organism Into which the cassette is Intended to be inserted. An important 
requirement to this end is that the promoter should preferably be capable of 

30 functioning in the host cell, in which the expressible nucleotide sequence is to be 
expressed. 



20 



vectors. 



35 



Preferably the promoter is an externally controllable promoter, such as an inducible 
promoter and/or a repressible promoter. The promoter may be either controllable 
(repressible/indudble) by chemicals such as the absence/presence of chemical 
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inducers. e.g. metabolites, substrates, metals, homnones, sugars. The promoter may 
likewise be oontroflable by certain physical parameters such as temperature, pH. 
redox status, growth stage, devdopmental stage, or the promoter may be 
indudble/repressible by a synthetic inducer/r^ressor such as the gal inducer. 

5 

In order to avoid unintentional interference with the gene regulation systems of the 
host ceil, and in order to improve controllability of the co-ordinated gene expression 
the promoter Is preferably a synthetic promoter. Suitable promoters are described in 
US 5.798.227, US 5.667.986. Prindples for designing suiteble synthetic eukaryotic 
10 promoters are disclosed in US 5.559.027, US 5.877.01 8 or US 6,072.050. 

Synthetic inducible eukaryotic promoters for the regulation of transcription of a gene 
may achieve improved levels of protein expression and lower basal levels of gene 
expression. Such promoters preferably contain at least two different classes of 

15 regulatory elements, usually by modification of a native promoter containing one of 
the inducible elements by inserting the other of the inducible elements. For example, 
additional metal responsive elements IR:Es) and/or glucocorticoid responsive 
elements (GREs) may be provided to native promoters, Additionally, one or more 
constitutive elements may be functionally disabled to provide tiie lower basal levels 

20 of gene expression. 



Prefened examples of promoters include but is not limited to ttiose promoters being 
induced and/or repressed by any factor selected from the group comprising 
carbohydrates, e.g. galactose; low inorganic phosphase levels; t^nperature. e.g. 
low or high temperature shift; metals or metal ions. e.g. copper ions; hormones. e.g. 
dihydrotestosterone; deoxycorticosterone; heat shock (e.g. 39*C): methariol; redox- 
status: growth stage, e.g. developmental stage; syntiietic inducers, e.g. gal Inducer. 
Examples of such promoters indude ADH 1. PGK 1. GAP 491, TPI, PYK. ENO, 
PMA 1. PH05, GAL 1. GAL 2. GAL 10, MET25, ADH2. MEL 1, CUP 1. HSE. AOX, 
MOX. SV40. CaMV. Opaque-2. GRE. ARE. PGK/ARE hybrid. CYC/GRE hybrid. 
TPI/a2 operator. AOX 1. MOX A. 

More preferably, however the promoter is selected from hybrid promoters such as 
PGK/ARE hybrid. CYC/GRE hybrid or from syntiietic promoters. Such promoters 
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can be controiled without interfering too much with the regulation of native genes in 
the expression host 

In the foHowing. exampi^ of Icnown yeast promoters that may be used In 
conjunction with the present invention are shown. The examples are by no way 
limit^g and only serve to indicate to the skilled practitioner how to select or design 
promoters that are useful according to the present invention. 

Although numerous transcriptional promoters which are funcOonal in yeasts have 
been described in the literature, only some of them have proved effective for the 
production of polypeptides by the recombinant route. There may be mentioned in 
particular the promoters of the PGK genes (3-phosphoglycerate kinase. TDH genes 
encoding GAPDH (Glyceraldehyde phosphate dehydrogenase), TEF1 genes 
(Elongation factor 1), IVIFal (a sex pheromone precursor) which are considered as 
strong constitutive promoters or alternatively the regulatabie promoter CYCI which is 
repressed in the presence of glucose or PH05 which can be regulated by thiamine. 
However, for reasons which are often unexplained, they do not always allow the 
effective expresston of the genes which they control, in this context, it is always 
advantageous to be able to have new pnDmoters in order to generate new effective 
host/vector systems. Furthennore. having a choice of effective promoters in a given 
cell also makes it possible to envisage the production of multiple proteins in this 
same ceU (for example several enzymes of the same metaboHc chain) while 
avoiding the problems of recombination between homologous sequences. 

In general, a promoter region is situated in tiie 5* region of tiie genes and comprises 
ail the elements allowing the transcription of a DNA fragment placed under their 
control, in particular 

(1) a so-called minimal promoter region comprising the TATA box and tiie site of 
initiation of transcription, which detenmines the positton of Uie site of initiation as 
well as tile basal level of ti^nscripOon. In Saccharomyces cerevisiae. the length 
of the minimal promoter region is relatively variable. Indeed, Uie exact location of 
tiie TATA box varies from one gene to another and may be situated fl-om -40 to - 
120 nucleotides upsfream of ttie site of the initiation (Chen and StmhI. 1985, 
EMBO J., 4, 3273-3280) 




(2) sequences situated upstream of the TATA box (Immediately upstream up to 
several hundreds of nucleotides) which malce It possible to ensure an effecfive 
level of transcription either constitutively (relatively constant level of transcription 
all along the ceD cyde. regardless of the conditions of culture) or in a regulatable 
manner (activation of transcription in the presence of an activator and/or 
repression in the presence of a repressor). These sequences, may be of several 
types: activator, inhibitor, enhancer, inducer, repressor and may respond to 
cellular factors or varied culture conditions. 

Examples of such promoters are the ZZA1 and ZZA2 promoters disclosed in US 
5,641.661. the EF1-a protein promoter and the ribosomal protein S7 gene promoter 
disclosed in WO 97/44470,. the COX 4 promoter and two unitnown promoters (SEQ 
ID No: 1 and 2 in the document) disclosed in US 5,952,195. Other useful promoters 
include the HSP150 promoter disclosed in WO 98/54339 and the SV40 and RSV 
promoters disclosed in US 4,870.013 as well as the PyK and GAPDH promoters 
disclosed in EP 0 329 203 A1. 



More preferably the Invention employs the use of synthetic promoters. Synthetic 
promoters are often constructed by combining the minimal promoter region of orie 

20 gene with the upstrearh regulating sequences of another gene. Enhanced promoter 
control may be obtained by nxidifying specific sequences in the upstream regulating 
sequences, e.g. through substitution or deletion or through inserting multiple copies 
of specific regulating sequences. One advantage of using synthetic promoters is that 
tiiey can be controlled without interfering too much with tiie native promoters of the 

25 host cell. 



One such synflietic yeast promoter comprises promoters or promoter el^ents of 
two different yeast-derived genes, yeast killer toxin leader peptide, and amino 
terminus of IL-1 p (WO 98/54339). 

Anottier example of a yeast synthetic promoter is disclosed in US 5,436,136 (Hinnen 
et al), which concerns a yeast hybrid promoter including a 5' upstream promoter 
element comprising upstream activation site(s) of the yeast PH05 gene and a 3* 
downstream promoter element of the yeast GAPDH gene starting at nucleotide -300 
to -180 and ending at nucleotide -1 of ttie GAPDH gene. 
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Another example of a yeast synthetic promoter is disclosed In US 5,089,398 
(Rosenberg et al). This disclosure describes a promoter with the general fonnula - 
(P.R.(2)-P.R.(1))- 
5 wherein: 

P.R.(1) is the promoter region proximal to the coding sequence and having the 
transcription initiation site, the RNA polymerase binding site, and including the TATA 
box, the CAAT sequence, as well as translational regulatory signals, e.g., capping 
sequence, as appropriate; 
10 P.R.(2) is the promoter region joined to the ff-end of P.R.(1) associated with 
.enhancing the efficiency of transcription of the RNA polymerase binding region; 

In US 4,945,046 (Horil et al) discloses a further example of how to design a 
synthetic yeast promoter. This specific promoter comprises promoter elements 
15 derived both from yeast and from a mammal. The hybrid promoter consists 
essentially of Saccharomyces cerevisiae PH05 or GAP-DH promoter from which the 
upstream acBvation site (UAS) has been deleted and replaced by the early 
enhancer region derived from SV40 virus. 

20 Co-ordinated expression of gene subsets can also be utilised to identify which 
heterologous genes are responsible for the production of a given phenotype. 

In the following the sequence of steps to be takm when starting with the isolation of 
mRtslA until insertion to an entry vector for providing the cells according to the 
25 invention is described, in short the sequence may include the following steps 
i) isolating mRNA from an expression state, 

il) obtaining substantially full length cDNA clones corresponding to the 
mRNA sequences, 

iii) inserting the substantially full length cDNA clones Into a cloning site 
30 in a cassette in a primary vector, said cassette being of the general 

formula in 5'-»3' direction: 
[RS 1 -RS2-SP-PR-CS-TR-SP-RS2'.RS1 1 
wherein CS denotes a cloning site. 
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Expression cassettes 

The expression cassettes according to the present invention are preferably arranged 
as a cassette of nucleotides in a highly ordered sequence^ the cassette having the 

5 general fomnula in direction: 

[RS1-RS2-SP-PR-CS-TR-SP-RS2-RS11 
wherein RS1 and RSV denote restriction sites. RS2 and RS2* denote restriction 
sites different from RSI and RSV, SP denotes a spacer sequence of at least two 
nucleotides, PR denotes a promoter, CS denotes a cloning site, and TR denotes a 

0 tennlnaton all of them being as discussed elsewhere in this specification. 

It is an advantage to have two different restriction sites flanking both sides of the 
expression construct By treating the primary vectors with restriction enzymes 
cleaving both restriction sites, the expression construct and the primary vector will 
15 be left with two non-compatible ends. This facilitates a concatenation process, since 
the empty vectors do not partidpate in the concatenation of expression constructs. 

in principle, any restriction site, for which a restriction enzyme is Icnown can be 
used. These include the restriction enzymes generally known and used In the field of 
iO molecular biology such as those described in Sambrook, Fritsch, Maniatis. TA 
laboratory Manuar, 2"^ edition. Cold Spring Habor Laboratoiy Press, 1989, 

The restriction site recognition sequences preferably are of a substantial length, so 
that the likelihood of occunrence of an identical restriction site within the cassette is 

15 minimised. Thus the first restriction stte may comprise at least 6 bases, but more 
preferably the recognition sequence comprises at least 7 or 8 bases. Restriction 
sites having 7 or more non N bases in the recognition sequence are generally 
known as "rare restriction sites" (see example 13). However, the recognition 
sequence may also be at least 10 bases, such as at least 15 bases, for example at 

10 least 16 bases, such as at least 17 bases, for example at least 18 bases, such as at 
least 18 bases, for example at least 19 bases, for example at least 20 bases, such 
as at least 21 bases, for example at least 22. bases, such as at least 23 bases, for 
example at least 25 bases, such as at least 30 bases, for example at least 35 bases, 
such- as at least 40 bases, for example at least 45 bases, such as at least 50 bases. 
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Preferably the first restriction site RSI and RSI' is recognised by a rBstriction 
enzyme generating blunt ends of the double stranded nudeotide sequences. By 
generating blunt ends at this site, the risk that the vector participates in a 
subsequent concatenation is greatly reduced. The first restriction sSe may also give 
rise to sticky ends, but these are then preferably non-compatible to the sticky ends 
resulting from the second restrictton site. RS2 and RS2'. 

According to a preferred embodiment of the Invention, the second restriction site. 
RS2 and RS2' comprises a rare restriction site. Thus, the longer the recognition 
sequence of the rare restriction site the more rare it is and the less likely is it that the 
restriction enzyme recognising it will cleiave the nucleotide sequence at other - 
undesired - positions. 

The rare restriction site may furthermore serve as a PCR priming site. Thereby it is 
possible to copy the cassettes via PCR techniques and thus indirectly "excise" the 
cassettes from a vector. 



Single-stranded compatible ends may be created by digestion with restriction en- 
zymes. For concatenation a preferred en^me for excising the cassettes would be a 
rare cutter. I.e. an enzflne that recognises a sequence of 7 or more nucleotides. 
Examples of enzymes thai cut very rarely are the meganudeases. many of which 
are intron encoded, like e.g. I-Ceu I, l-Sce I. I-Ppo I, and Pl-Psp I (see example 13d 
for more). Other prefenied enzymes recognize a sequence of 8 nucleotides like e.g. 
Asc I, AsiS I, CciN I. CspB 1. Fse I. MchA I, Not I. Pac i. Sbf 1. Sda I. Sgf I. SgrA I. 
Sse232 1, and Sse8387 1, all of which create single stranded, palindromic compatible 
ends. 



Other prefen-ed rare cutters, which may also be used to control orientation of 
individual cassettes in the concatemer are enzymes that recognize nori-pallndromic 
sequences like e.g. Aar I. Sap I. Sfi I, Sdi I. and Vpa (see example 13c for more). 

Alternatively, cassettes can be prepared by the addition of restriction sites to the 
ends. e.g. by PCR or ligation to linkers (short synthetic dsDNA molecules). 
Restfk^on enzymes are continuously being isolated and characterised and it is 
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anticipated that many of such novel enzymes can be used to generate single- 
stranded compatible ends according to the present Invention. 

It is conceivable that single stranded compatible ends can be made by cleaving the 
5 vector with synthetic cutlers. Thus, a reactive chemical group that will normally be 
able to cleave DMA unspecifically can cut at specific positions when coupled to 
another molecule that recognises and binds to specific sequences. Examples of 
molecules that recognise specific dsDNA sequences are DNA. PNA, LIMA, 
phosphothioates, peptides, and amides. See e.g. Annitage, B.(1998) Chem. Rev. 

10 98: 1171-1200. who describes photocleavage using e.g. anthraquinone and UV 
light; Dervan P.B. & BQrIi R.W. (1999) Cun-. Opin. Chem. Biol. 3: 688-93 describes 
the specific binding of polyamides to DNA; Nielsen, P,E. (2001) Curr. Opin. 
Biotechnol. 12: 16-20 describes the specific binding of PNA to DNA, and Chemical 
Reviews special thematic issue: RNA/DNA Cleavage (1998) vol. 98 (3) Bashkin J.K. 

1 5 (ed.) ACS publications, describes several examples of chemical DNA cleavers. 

Single-stranded compatible ends may also be created e.g. by using PCR primers 
including dUTP and then treating the PCR product with Uracil-DNA glycosytase 
(Ref: US 5,035,996) to degrade part of the primer. Altematively, compatible ends 
20 can be created by tailing both the vector and insert with complimentary nucleotides 
using Terminal Transferase (Chang, LMS. BoHum TJ (1971) J Biol Chem 246:909). 

The spacer sequence located between the RS2 and the PR sequence is preferably 
a non-transcribed spacer sequence. The purpose of the spacer sequence(s) is to 

25 minimise recombination between different concatemers present In the same cell or 
between cassettes present In the same concatemer, but it may also serve the pur- 
pose of making the nucleotide sequences in the cassettes more "hosf like. A further 
purpose of the spacer sequence is to reduce the occurrence of hairpin formation 
between adjacent palindromic sequences* which may occur when cassettes are 

30 assembled head to head or tail to tail. Spacer sequences may also be convenient 
for Introducing short conserved nucleotide sequences that rhay serve e.g. as PCR 
primer sites or as target for hybridization to e.g. nucleic acid or PNA or LNA probes 
allowing affinity purification of cassettes. 
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The cassette may also optionally comprise another spacer sequence of at least two 
nucleotides between TR and RS2. When cassettes are cut out from a vector and 
conratenated into concatemers of cassettes, the spacer sequences together ensure 
that there Is a certain distance between two successive identical promoter or 

5 temiinator sequences. This distance may comprise at least 50 bases, such as at 
least 60 bases, for example at least 75 bases, such as at least 100 bases, for 
example at least 150 t>ases, such as at least 200 bases, for example at least 250 
bases, such as at least 300 bases, for example at least 400 bases, for example at 
least 500 bases, such as at least 750 bases, for example at least 1000 bases, such 

10 as at least 1100 bases, for example at least 1200 bases, such as at least 1300 
bases, for example at least 1400 bases, such as at least 1500 bases, for example at 
least 1600 bases, such as at least 1700 bases, for example at least 1800 bases, 
such as at least 1900 bases, for example at least 2000 bases, such as at least 2100 
bases, for example at least 2200 bases, such as at least 2300 bases, for example at 

15 least 2400 bases, such as at least 2500 bases, for example at least 2600 bases, 
such as at least 2700 bases, for example at least 2800 bases, such as at least 2900 
bases, for example at least 3000 bases, such as at least 3200 bases, for example at 
least 3500 bases, such as at least 3800 bases, for example at least 4000 bases, 
such as at least 4500 bases, for example at least 5000 bases, such as at least 6000 

20 bases. 

The number of the nucleotides between the spacer located 5' to the PR sequence 
and the one located 3' to the TR sequence may be any. However, it may be 
advantageous to ensure that at least one of the spacer sequences comprises 
25 between 100 and 2500 bases, pref^bly between 200 and 2300 bases, more 
preferably between 300 and 2100 bases, such as between 400 and 1900 bases, 
more preferably between 500 and 1700 bases, such as between 600 and 1500 
bases, more preferably between 700 and 1400 bases. 

30 If the intended host cell is yeast, the spacers present in a concatemer should 
perferably comprise a combination of a few ARSes with varying lambda phage DNA 
fragments. 

Preferred examples of spacer sequences include but are not limited to: Lamda 
35 phage DMA, prokaryoflc genomic DNA such as E. coli genomic DNA, ARSes. 
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The cloning site in the cassette in the primary vector should be designed so that any 
nucleotide sequence can t>e doned into it. 

The cloning site in the cassette preferably allows directional cloning. Hereby is 
ensured that transcrtptton in a host cell is performed from the coding strand in the 
intended direction and that the translated peptide is identical to the pepfide for which 
the original nucleotide sequence codes. 

However according to some embodiments it may be advantageous to insert the 
sequence In opposite direction. According to these embodiments, so-called 
antisense constructs may be inserted which prevent functional expression of specific 
genes involved in specific pathways. Thereby it may become possible to divert 
metabolic intermediates from a prevalent pathway to another less dominant 
pathway. 

The cloning site in the cassette may comprise multiple cloning sites, generally 
known as MCS or polylinker sites, which is a synthetic DNA sequence encoding a 
series of restriction endonuclease recognition sites. These sites are engineered for 
convenient cloning of DNA into a vector at a specific position and for directional 
cloning of the insert. 

Cloning of cDNA does not have to involve the use of restriction en^mes. Other 
alternative systems include but are not limited to: 

- Creator™ Cre-loxP system from Clontech, which uses recombination and loxP 
sites 

- use of Lambda attachment sites {aVL-X), such as the Gatewayi^ system from Life 
Technologies. 

Both of these systems are directional. 

The role of the temiinator sequence is to limit transcription to the length of the 
coding sequence. An optimal terminator sequence is thus one, which is capable of 
perfomiing this act In the host cell. 
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In prokaryotes, sequences known as transcriptional temninators signal the RNA 
polymerase to release the DNA template and stop transcription of the nascent RNA. 

In eukaryotes, RNA molecules are transcribed well beyond the end of the mature 
5 mRNA molecule. New transcripts are enzymatically deaved and modified by the 
addition of a long sequence of adenylic acid residues known as the poly-A tail. A 
polyadenylation consensus sequence is located about 10 to 30 bases upstream 
from the actual cleavage site. 

10 Prefen-ed examples of yeast derived tenninator sequences include, but are not 
limited to: ADN1, CYC1. GPD, ADH1 alcohol dehydrogenase. 

Depending on the nature of the host cell, it may be advantageous that at least one 
cassette comprises an intron between the promoter and the expressible nucleotide 
15 sequence, more preferable that substantially all cassettes comprise an intron 
between the promoter and ttie expressible nucleotide sequence. The choice of 
intron sequence depends on requirements of the host cell. 

Thus, optionally the cassette in the vector comprises an intron sequence, which may 
20 be located 5' or 3' to the expressible nucleotide sequence. The design and layout of 
introns is well known In the art The choice of intron design largely depends on the 
intmded host cell, in which the expressible nucleotide sequence is eventually to be 
expressed. The effects of having Intron sequence in the expression cassettes are 
those generally associated with Intron sequences. 

25 ' 

Examples of yeast introns can be found In the literature and in specific databases 
such as Ares Lab Yeast Intron Database (Version 2.1) as updated on 15 April 2000. 
Eariier versions of the database as well as extracts of the database have been 
published in: "Genome-wide bioinformatic and molecular analysis of introns in 
30 Saccharomyces cerevislae." by Spingola M, Grate L, Haussler D, Ares M Jr. (RNA 
1999 Feb;5(2):221-34) and "Test of intron predicBons reveals novel splice sites, 
alternatively spliced mRNAs and new introns in meiotically regulated genes of 
yeast." by Davis CA. Grate L, Spingola M, Ares M Jr, (Nucleic Adds Res 2000 Apr 
15:28(8):1 700-6). 

35 
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Primary vectors (entry vectors) 

By the term entry vector is meant a vector for storing and amplifying cDNA or other 
expressible nucleotide sequences using the cassettes according to the present 
5 invention. The enfry vectors or primary vectors are preferably able to propagate in E. 
coli or any other suitable standard host cell. It should preferably be ampllfiable and 
amenable to standard normalisation and enrichment procedures. 

The entry vector may be of any type of DNA that has the basic requirements of a) 

10 being able to replicate itself in at least one suitable host organism and b) allows 
Insertion of foreign DNA which is then replicated together with the vector and c) 
preferably allows selection of vector molecules that contain insertions of said foreign 
DMA. In a preferred embodiment the vector is able to replicate in standard hosts like 
yeasts, bacteria and it should preferably have a high copy number per host cell. It is 

15 also preferred that the vector In addition to a host, specific origin of replication, 
contains an origin of replication for a single stranded vims, such as e.g. the f 1 origin 
for filamentous phages. This will allow the production of single stranded nucleic acid 
which may be useful for normalisation and enrichment procedures of doried 
sequences. A vast number of cloning vectors have been described which are 

20 commonly used and references may be given to e.g. SambrookJ; Fritsch. E.F; and 
Maniatis T. (1989) IWfcilecular Cloning: A laboratory manual. Cold Spring Harbour 
Laboratory Press, USA, Netheriands Culture Collection of Bacteria 
(www,cbs-knaw.nl/NCCB/ coHecHon.htm^ or Department of Microbial Genetics. 
NaHonar Institute of Genetics. Yata 1111 Mishima Shi2xioka 411-8540, Japan 

25 fwww.shlQen.niQ.acJD/cvector/cvectQr-html)- A few type-examples that are the 
parents of many popular derivatives are M13mp10. pUC18, Lambda gt 10, and 
pYAC4. Examples of primary vectors include but are not Hmited to M13K07, 
PBR322, pUC18. pUC19, pUC118. pUC119. pSP64, pSP65, pGEM-3, pGEM-3Z. 
pGEM-32f(-), pGEM-4. pGEM-4Z, 7cAN13. pBluescript II, CHARON 4A. X*. 

30 CHARON 21A, CHARON 32. CHARON 33, CHARON 34, CHARON 35, CHARON 
40, EMBL3A, X2001, XDASH, XFIX, XgtIO, Xgtll, Xgt18, Agt20. Xgt22. XORF8, 
AZAP/R. pJB8, c2RB, pcoslEMBL 



35 



Methods for cloning of cDNA or genomic DNA Into a vector are well known in the 
art Reference may be given to J. Sambrook, E.F. Fritsch. T. Maniatis: Molecular 




Cloning, A Laboratory Manual (2*^ edition, Coid Spring Harbor Laboratory Press, 
1989). 

One example of a circular model entry vector is described in Rgure 1 1 . The vector, 
5 EVE contains the expression cassette, R1-R2-Spaoer-Promoter-l^ulti Cloning Stte- 
Terminator-Spacer-R2-R1. The vector furthermore contains a gene for ampicillin 
resistance, AmpR, and an origin of replication for E.coli, C0IEI. 

The entry vectors EVE4, EVES, and EVES shown in Figures 12, 13 and 14. These 
10 all contain Srfl as R1 and AscI as R2. Both of these sites are palindromic and are 
regarded as rare restriction sites having 8 bases in the recognition sequence. The 
vectors furthermore contain the AmpR ampicillin resistance gene, and the C0IEI 
origin or 'repJIcation for E.coli as well as f1, which is an origin of replication for 
filamentous phages, such as M13. EVE4 (Fig. 12) contains the MET25 promoter 
15 and the ADH1 terminator. Spacer 1 and spacer 2 are short sequences deriving from 
the multiple cloning site. MCS. EVES {Fig. 13) contains the CUP1 promoter and the 
ADH1 terminator. EVES (Fig. 14) contains the CUP1 promoter and the ADH1 
temninator. The spacers of EVE8 are a 550 bp lambda phage DNA (spacer 3) and 
an ARS sequence from yeast (spacer 4). 

20 

Nucleotide library (entry library) 

A schematic illustration of the steps leading from expression steps to a nucleotide 
library are illustrated In figure 9. 

25 ' . 

Methods as well as suitable vectors and host cells for constructing arid maintaining 
a library of nucleotide sequences in a cell are well known in . the art. The primary 
requirement for the library is that is should be possible to store and amplify in it a 
number of primary vectors (constructs) according to this Invention, the vectors 

30 (constructs) comprising expressible nucleotide sequences from at least one 
expression state and wherein at least two vectors (constructs) are different. 



35 



One specific example of such a library is the well knovm and widely employed cDNA 
libraries. The advantage of the cDNA library is mainly that It contains only DNA 
sequences conresponding to transcribed messenger RNA In a cell. Suitable methods 
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are also present to purify the isolated mRNA or the synthesised cDNA so that only 
substantially full-length cDNA is cloned into the library. 

Methods for optimisation of the process to yield substantially full lengtti cDNA may 
5 comprise size selection, e.g. electrophoresis, chromatography, precipitation or may 
comprise ways of increasing tiie likelihood of getting full length cDNAs, e.g. tine 
SMART™ method (Clonetech) or the CapTrap™ mettiod (Stratagene). 

Preferably the method for making the nucleotide library comprises obtaining a 
10 substantially full lengtti cDNA population comprising a nonnalised representation of 
cDNA species. More preferably a substantially full lengtti cDNA population 
comprises a normalised representation of cDr4A species characteristic of a given 
expression state. 

15 Normalisation reduces the redundancy of clones representing abundant mRNA 
species and Increases ttie relative representation of clones from rare mRNA 
species. 

Mettiods for normalisation of cDNA Hbraries are well known in the art. Reference 
20 may be given to suitable protocols for normalisation such as fliose described in US 
5.763.239 (DIVERSA) and WO 95/08647 and WO 95/11986. and Bonaldo. Lennon, 
Soares, Genome Research 1996, 6:791-806; All, Holloway, Taylor, Plant Mol Biol 
Reporter, 2000, 18:123-132. 

25 Enrichment methods are used to isolate clones representing mRNA which are 
characteristic of a particular expression state. A number of variations of the mettiod 
broadly termed as subtractive hybrisation are known in the art. Reference may be 
given to Sive. John, Nucleic Acid Res, 1988, 16:10937; DIatchenko, Lau. Campbell 
et al. PNAS, 1996, 93:6025-6030; Carninci. Shibata. Hayatsu. Genome Res. 2000. 

30 10:1617-30, Bonaldo. Lennon. Soares, Genome Research 1996, 6:791-806; Ali. 
Holloway, Taylor, Plant Mol Biol Reporter, 2000. 18:123-132. For example, 
enrichment may be achieved by doing additional rounds of hybridization similar to 
normalization procedures, using e.g. cDNA from a library of abundant clones or 
simply a library representing ttie uninduced state as a driver against a tester library 

35 from the induced state. Alternatively mRNA or PGR amplified cDNA derived from ttie 
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expression state of choice can be used to subtract common sequences from a tester 
library. The choice of driver and tester population v^il depend on the nature of target 
expressible nucleotide sequences in each particular experiment 

Finally, enrichment may be achieved by subtracUve hybridisation followed by colony 
picking. 

In the library an expressible nucleotide sequence coding for one peptide is 
preferably found in different but similar vectors under the control of different 
promoters. Preferably the library comprises at least three primary vectors with an 
expressible nucleotide sequence coding for the same peptide under the contnDl of 
three different promoters. More preferably the library comprises at least four primary 
vectors with an expressible nucleotide sequence coding for the same peptide under 
the control of four different promoters. More preferably the library comprises at least 
five primary vectors with an expressible nucleotide sequence coding for the same 
peptide under the control of five different promoters, such as comprises at lest six 
primary vectors with an expressible nucleotide sequence coding for the same 
peptide under the control of six different promoters, for example comprises at least 
seven primary vectors with an expressible nucleotide sequence coding for the same 
peptide under the control of seven different promoters, for example comprises at 
least eight primary vectors with an expressible nudeoUde sequence coding for the 
same peptide under the control of eight different promoters, such as comprises at 
least nine primary vectors with an expressible nucleotide sequence coding for the 
same peptide under the control of nine different promoters, for example comprises 
at least ten primary vectors with an expressible nucleotide sequence coding for the 
same peptide under the control of ten different promoters. 

The expressible nucleotide sequence coding for the same peptide preferably 
comprises essentially the same nucleotide sequence, more preferably the same 
nucleotide sequence. 

By having a library with what may be termed one gene under the control of a 
number of different promoters in different vectors, it is possible to constmct from the 
nucleotide library an array of combinations of genes and promoters. Preferably, one 
library comprises a complete or substantially complete combination such as a two 
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dimensional array of genes and promoters, wherein substantially all genes are found 
under the control of substantially all of a selected number of promoters. 

According to another embodiment of the invention the nucleotide library comprises 
combinations of expressible nucleotide sequences combined in different vectors 
with different spacer sequences and/or different intron sequences. Thus any one 
expressible nucleotide sequence may be combined In a two. three, four or five 
dimensional anay with different promoters and/or different spacers and/or different 
Introns and/or different temiinators. The two. three, four or five dimensional anBy 
may be complete or incomplete, since not all combinations will have to be present. 

The library may suitably be maintained in a host cell comprising prokaryotic cells or 
eul^aryotic cells. Preferred prokaryotic host organisms may include but are not 
limited to Escherichia coll. Bacillus subtills, Streptomyces lividans, Streptomyces 
coelicolor Pseudomonas aeruginosa, Myxococcus xanthus. 

Yeast species such as Saccharomyces cerevisiae (budding yeast), 
Schlzosaccharomyces pombe (fission yeast), Pichia pastoris, and Hansenula 
polymorpha (methylotropic yeasts) may also be used. Filamentous ascomycetes, 
such as Neurospora crassa and Aspergillus nidulans may also be used. Plant cells 
such as those derived from Nicotiana and Arabidopsis are prefen-ed. Prefened 
mammalian host ceHs Include but are not limited to those derived from humans, 
monkeys and rodents, such as Chinese hamster ovary (CHO) cells. NIH/3T3, COS. 
293. VERO, Hel^ etc (see Kriegler 1^. in "Gene Transfer and Expression: A 
Laboratory Manual". New Yoric, Freeman & Co. 1990). 

Concatemers 

For the purposes of providing a method for assembling multiple expression 
cassettes ("cassettes^ into a single host cell, and aliowing their facile remixing 
between cells, the expression cassettes are assembled into concatemers. 

A concatemer is a series of finked units. The concatemers according to the invention 
may comprise a selecUon of expressible nucleotide sequences from just one 
expression state and can thus be assemt)led from one library representing this 
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expression state or it may comprise cassettes from a number of different expression 
states. Tlie concatemers according to the invention are especially suitable for 
ligatlng into an artificial chromosome, which may be inserted into a host cell for 
coordinated expression. For this purpose, the variation among and between 

5 cassettes may be such as to minimise the chance of cross over as the host cell 
undergoes cell division such as Uirough minimising the level of repeat sequences 
occumng in any one concatemer, since it is not an object of this embodiment of ttie 
invention to obtain recombination of concatemers with a segment in the host 
genome or an epitope of the host cells nor is it an object to obtain Intra- or extra 

1 0 concatemeric recombination. 



According to a preferred embodiment of the invention the concatemer comprises at 
least a first cassette and a second cassette, said first cassette being different from 
said second cassette. More preferably, the concatemer comprises cassettes, 
15 wherein substantially all cassettes are different The difference between Uie 
cassettes may arise from differences between promoters, and/or expressible 
nucleotide sequences, and/or spacers, and/or introns and/or temiinators. 

The number of cassettes in a single concatemer is largely determined by the host 
20 species into which the concatemer is eventually to be inserted and Uie vector 
through which the insertion is carried out. The concatemer thus may comprise at 
least 10 cassettes, such as at least 15, for example at least 20, such as at least 25, 
for example at least 30, such as from 30 to 60 or more than 60, such as at least 75, 
for example at least 100, such as at least 200, for example at least 500. such as at 
25 least 750, for example at least 1000, such as at least 1500, for example at least 
2000 cassettes. 

Each of the cassettes may be laid out as described above. 

30 Thus, in a prefen^d embodiment a concatemer is used to denote a number of 
serially linked nucleotide cassettes, wherein at least two of the serially linked 
nucleotide units comprises a cassette having the basic structure 

[rs2-SP-PR.X-TR-SP-rst] 

wherein 

35 rsi and rs2 together denote a restriction site. 
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SP denotes a spacer of at least two nucleotide bases, 
PR denotes a promoter, capable of functioning in a ceil, 
X denotes an expressible nucleotide sequence, 
TR denotes a terminator, and 
5 SP denotes a spacer of at least two nucleotide bases. 

wheiiein the variables of the cassette have the meaning as defined elsewhere in this 
specification. Optionally the cassettes comprise an intron sequence between the 
promoter and the expressible nucleotide sequence and/or between the terminator 
10 and the expressible nucleotide sequence as discussed above. 

According to one aspect of the invention, a concatemer comprises cassettes with 
expressible nucleotide from different expression states, so that non-naturally 
occurring combinations or non-native combinations of expressible nucleotide 
1 5 sequences are obtained. 

According to a preferred embodiment of the invention the concatemer comprises at 
least a first cassette and a second cassette, said first cassette being different from 
said second cassette. More preferably, the concatemer comprises cassettes. 
20 wherein substantially all cassettes are different The difference between the 
cassettes may arise from differences between promoters, and/or expressible 
nucleotide sequences, and/or spacers, and/or tenninators, and/or introns. 

The concatenation may be carried out In different ways. 

25 

Cassettes to be concatenated are nonnally exdsed from a vector or they are 
synthesised through PGR. After excision Uie cassettes may be separated from the 
vector through size fractionation such as gel filtration or through tagging of known 
sequences in the cassettes. The isolated cassettes may ttien be ligated togettier 
30 eittier through interaction between sticky ends or through ligation of blunt ends. 

More preferably the cassettes may be concatenated without an inten^ening 
purification step through excision from a vector with two restriction enzymes, one 
leaving sticky ends on ttie cassettes and the other one leaving blunt ends in the 
35 vectors. 



P669DK00 




103 



An alternative way of producing concatemers free of vector sequences would be to 
PCR ampfify the cassettes from a single stranded prianary vector. The PCR product 
must include the restriction sites RS2 and RSa* wrtiich are subsequently cleaved by 
its cognate en2yme(s). Concatenation can then be perfomied using the digested 
PCR product, essentially without interference from the single stranded primary 
vector template or the small double stranded fragments, which, have been cut fwm 
the ends. 



10 When the vectors comprising the cassettes are single stranded, the cassettes may 
• be excised and be made double stranded through PCR techniques, which only 
prime the cassette sequence and not the vector sequence. Sticky ends can be 
made by cleaving with a restriction enzyme leaving sticky ends and the cassettes 
. can be assembled without interaction from the single stranded vector fragments. 

15 

The concatemer may be assembled or concatenated by concatenation of at least 
two cassettes of nucleotide sequences each cassette comprising a first sticky end. a 
spacer sequence, a promoter, an expressible nucleotide sequence, a terminator, 
and a second stidcy end. 

20 

After concatenation has been completed, concatemers of Vne desired size may be 
selected through size selection, such as selection for concatemers having at least 
10 cassettes, such as at least 15, for example at least 20. such as at least 25. Ibr 
example at least 30. such as firom 30 to 60 or more Uian 60. such as at least 75, for 
25 example at least 100. such as at least 200, for example at least 500, such as at 
least 750, for example at least 1000. such as at least 1500. for example at least 
2000 cassettes. The number of cassettes In each concatemer may be controlled by 
size fractionation after concatenation, since the size of the concatemers is 
approximately proportional to ttie number of cassettes. 

30 

Preferably at least one inserted concatemer in each cell comprises a selectable . 
maricer. Selectable maricers generally provide a means to select, for growth, only 
ttiose cells which contain a vector. Such mariners are of two types: drug resistance 
and auxotrophia A dnjg resistance mariner enables cells to detoxify an exogenously 
35 added drug that would oUienwfse kUi the cell. Auxotrophic maricere allow cells grow 
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In media lacking an essential component by enabling ceils to synthesise tlie 
essential component (usually an amino acid). 

Illustrative and non-limiting examples of common selectable markers with a brief 
description of their mode of action follow: 

Prokaiyotic 

• Ampldinn: Interferes vrfth a lennfnal reacBon In bacterial cell waS\ synlhests. The resistance gene (Ua) 
encodes beta-lactamase which cleaves the k>ela-lactam ring of (he anlibloBc thus detoxifying it 

• Tetracydine: prevents bacterial protein synthesis by binding to the 30S ribosomal subunit The resistance 
gene (let) specifies a protein that modifies the bacterial membrane and prevents transport of the antibioilc 
into the ceQ. 

• kanamydrc binds to the 70S ribosomes and causes misreading of messenger RNA. The resistant gene 
(nptH) modifies the antibiotic and prevents interaction with the ribosome. 

• Streptomydn: binds to the 30S ribosomal subunit. cau^g misreacfing of messes^ RNA. The resistance 
gene (Sm) modifies the antitriotic and prevents interaction with the riliosome. 

• Zteodn: this new bleomydn-ffamny antibiotic intercalates Into the DMA and cleaves it The Zeodn 
resistance gene encodes a 13,665 dallon protein. This protein confers resistance to Z^odn by binding to 
the antibiotic and preventing it from binding Dl^ Zeodn is effecQve on most aerobic cells and can be 
used for selection in mammalian ceil lines, yeast and bacteria 

■ Auxotrophic martcers. 

Eukaiyotle 

• Hygromydn: a aminocyditol that Inhibits protein synthesis by disnjpUng ribosome translocation and 
promoting mistranslation. The resistance gene (hph) detoxifies hygromydn -B- phosphorylation. 

• Histidinol: cytotoxic to mammalian cells by Inhibiting histidyURNA synthesis In hisOdine free media. The 
resfetance gene (hisD) product inactivates hIsSdind to)ddty by converting it to the essentia! amino add. 
histidine. 

• Neomydn (G418): blodcs protein synthesis by interfering with ribosomal fiincUons. The resistance gene 
ADH encodes amino glycoside phosphotransferase which detoxifies G418. 

• Uracil: Laboratory yeast strains carrying a mutated gene wWch encodes orolidine -S*- phosphate 
decarboxylase, an enzyme essential for uracQ biosynthesis, are unable to grow in the absence of 
exogenous uracfl. A copy of the wild-type gene (ura4+, S. pombe or URA3 S. cerevisiae) carried on the 
vector will complement this defect in tran s formed cells. 

• Adenosine: .Laboratory str^ carrying a defidency in adenosine synthesis maybe complemented by a 
vector carrying the wOd type gene. ADE 2. 

• Amino adds: Vectors canylng the wild-type genes for LEU2. TRP 1. HIS 3 or LYS 2 may be used to 
complement strains of yeast defident in these genes. 

• Zeodm this new bleomydn-famfly antibiotic intercalates into the DMA and deaves It The Zeodn 
resistance gene encodes a 13.665 dalton protein. This protein confers resistance to Zeodn by binding to 
the antibiotic and preventing it from binding DNA. Zeodn is effecthre on most aerotric cells and can be 
used for selection in mammalian ceQ Ones, yeast and bacteria. 
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The number of concatemers In one single cell may be at least one concatemer per 
cell, preferably at least 2 concatemers per cell, more preferably 3 per cell, such as 4 
per cell, more preferably 5 per cell', such as at least 5 per cell, for example at least 6 
per cell, such as 7, 8. 9 or 10 pa- cell, for example more than 10 per cell. As 

5 described above, each concatemer may preferably comprise up to 1000 cassettes, 
and It is envisages that one concatemer may comprise up to 2000 cassettes. By 
inserting up to 10 concatemers Into one single cell, this cell may thus be enriched 
with up to 20.000 new expressible genes, which under suitable conditions may be 
turned on and off by regulation of the regulatable promoters. However it may be 

10 more preferable to pro\rtde cells having anywhere between 10 and 1000 novel 
genes, such as 20-900 novel genes, for example 30 to 800 novel genes, such as 40 
to 700 novel genes, for example 50 to 600 novel genes, such as from 60 to 300 
novel genes. The genes may advantageously be located on 1 to 10 such as from 2 
to 5 different concatemers In the cells. Each concatemer may advantageously 

15 comprise from 10 to 1000 genes, such as from 10 to 750 genes, such as from 10 to 
500 genes, such as from 10 to 200 genes, such as from 20 to 100 genes, for 
example from 30 to 60 genes. 

The concatemers may be inserted Into the host cells according to any known 
20 transformation technique, preferably according to such transformation techniques 
that ensure stable and not transient transformation of the host cell. The concatemers 
may thus be inserted as an artifidal chromosome which is replicated by the cells as 
they divide or they may be inserted into the chromosomes of the host cell. The 
concatemer may also be inserted In the form of a plasmid such as a plasmid vector, 
25 a phage vector, a viral vector, a cosmid vector, that is replicated by the cells as they 
divide. Any combination of the three insertion methods is also possible. One or more 
concatemers may thus be integrated Into the chromosome(s) of the host cell and 
one or more concatemers may be Inserted as plasmids or artificial chromosomes. 
One or more concatemers may be inserted as artificial chromosomes and one or 
30 more may be iriserted Into the same cell via a plasmid. 

The basic requirements for a functional artificial chromosome have been described 
in US 4,464,472, the contents of which is hereby incorporated by reference. An 
artificial chromosome or a functional minichromosome, as it may also be termed 
35 must comprise a DMA sequence capable of replication and stable mitotic 
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maintenance in a host cell comprising a DMA segment coding for centromere-lilte 
activity during mitosis of said host and a DMA sequence coding for a replication site 
recognized by said host 

5 Suitable artificial chromosomes include a Yeast Artifidal Chromosome (YAC) (see 
e.g. lyflurray et al. Nature 305:189-193; or US 4.464,472), a mega Yeast Artificial 
Chromosome (mega YAC). a Bacterial Artificial Chromosome (BAC), a mouse 
artifidal chromosome, a Mammalian Artificial Chromosome (MAC) (see e.g. US 
6.133.503 or US 6,077,697), an Insect Artificial Chromosome (BUGAC). an Avian 
10 Artifidal Chromosome (AVAC). a Bacteriophage Artiflclal Chromosome, a 
Baculovims Artificial Chromosome, a plant artificial chromosome (US 5.270,201). a 
BIBAC vector (US 5.977.439) or a Human Artifidal Chromosome (HAC). 

The artifidal chromosome is preferably so large that the host cell perceh/es it as a 
15 "rear chromosome and maintains it and transmits it as a chromosome. For yeast 
and other suitable host spedes. this will often correspond approximately to the size 
of frie smallest native chromosome in the species. For Saccharamyces. the smallest 
chromosome has a size of 225 Kb. 

20 MACS may be used to constmct artificial chrcMtiosomes from ottier species, such as 
insect and fish spedes. The artifidal chromosomes preferably are fully functional 
stable chromosomes. Two types of artifidal chromosomes may be used. One type, 
referred to as SATACs (satellite artifidal chromosomes] are stable heterochrennatic 
chromosomes, and ttie otiier ^e are minichromosomes based on amplificatipn of 

25 euchromafin. 

Mammalian artifidal chromosomes provide extra-genomic specific int^ration sites 
for introduction of genes encoding proteins of interest and pennit nnegabase size 
DNA integration, such as integration of concatemers according to the invention. 

30 

According to anotiier embodiment of the invention, the concatemer may be 
integrated into the host chromosomes or doned into other types of vectors, such as 
a plasmid vector, a phage vector, a viral vector or a cosmid vector. 
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A preferable artificial chromosome vector is one that is capable of being 
conditionally amplified in the host cell, e.g. in yeast. The amplification preferably is at 
least a 10 fold amplification. Furthermore, it is advantageous that the cloning site of 
the artificial chromosome vector can be modified to comprise the same restriction 
5 site as Vt\e one bordering the cassettes described above, j.e. RS2 and/or RS2\ 

It is also conceivable that recombination can be used to generate concatemers, e.g. 
tiirough the modification of techniques like tiie Creator system (Clontech) which 
uses the Cre-loxP mechanism (ref: Sauer B 1993 Methods Enzymol 225:890-900) to 
10 direcBonally join DNA molecules by recombination or like the Gateway system (Life 
Technologies, US 5.888,732) using lambda att attachment sites for directional 
recombination (Landy A 1989, Ann Rev Biochem 58:913). It is envisaged tfiat also 
lambda cos site dependent systems can be developed to allow concatenation. 

15 The concatemer may be assembled or concatenated by concatenation of at least 
two cassettes of nucleotide sequences each cassette comprising a first sticky end, a 
spacer sequence, a promoter, an expressible nucleotide sequence, a terminator, 
and a second sticky end. A flow chart of the procedure is shown in figure 10a. 

20 Preferably concatenation further comprises 



25 



starting from a primary vector [RS1-RS2-SP-PR-X-TR-SP-RS2 -RS11, 
wherein X denotes an expressible nucleotide sequence, 
RSI and RSV denote restriction sites, 

RS2 and RS2* denote restriction sites different from RSI and RSI', 
SP denotes a spacer sequence of at least two nucleotides. 
PR denotes a promoter, 
TR denotes a terminator, 



30 



i) cutting the primary vector with ttie aid of at least one restriction 
enzyme specific for RS2 and RS2' obtaining cassettes having tiie 
general formula [rs2-SP-PR-X-TR-SP-rs,] wherein rsi and rs2 together 
denote a functional restriction site RS2 or RS2', 



ii) assembling the cut out cassettes Unrough interaction between rsi and 
rs2. 
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According to an especially preferred embodiment, vector arms each having a RS2 
or RS2" In one end and a non-complementary overhang or a blunt end in the other* 
end are added to the concatenation mixture together with the cassettes described 
above to further simplify the procedure (see Fig. 10b). One example of a suitable 
5 vector for providing vector anms Is disclosed in Fig. 16 TRP1, URA3, and HIS3 are 
auxotrophic marlcer genes, and AmpR is an antibiotic mariner gene, CEN4 is a 
centromer and TEL are telomeres. ARS1 and PMBi allow replication in yeast and E. 
coll respectively. BamH I and Asc 1 are restriction enzyme recognition sites. The 
nucleotide sequence of the vector is set forth In SEQ ID NO 4. The vector Is 
10 digested with BamHI and AscI to liberate the vector arms, which are used for ligation 
to the ooncatemer. 

. The. ratio of vector amns to cassettes detenmines the maximum number of cassettes 
. in the concatemer as illustrated in figure 18a and b. The vector arms preferably are 
15 artificial chromosome vector arms such as those described In Fig. 16. Figure 17 
illustrates the synthesis of concatamers from entry vector libraries. 

It is of course also possible to add stopper fragments to the concatenation solution, 
the stopper fragments each having a RS2 or RS2' in one end and a non- 
20 complementary overiiang or a blunt end In the other end. The ratio of stopper 
fragments to cassettes can likewise control the maximum size of the concatemer. 

As an alternative to providing vector amns for the concatenation procedure is 
possible to llgate the concatemer Into an artificial chromosome selected from the 
25 group comprising yeast artificial chromosome, mega yeast artificial chromosome, 
bacterial artificial chromosome, mouse artificial chromosome, human artificial 
chromosome. 

The number of concatemers in one single cell may be at least one ooncatemer per 
30 cell, preferably at least 2 concatemers per cell, more preferably 3 per cell, such as 4 
per cell, more preferably 5 per cell, such as at least 5 per cell, for example at least 6 
per cell, such as 7, 8, 9 or 10 per cell, for example more than 10 per cell. As 
described above, each concatemer may preferably comprise up to 1000 cassettes, 
. and it is en>rfsages that one concatemer may comprise up- to 2000 cassettes. By 
35 inserting up to 10 concatemers into one single cell, this cell may thus be enriched 
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with up to 20.000 heterologous expressible genes, which under suitable conditions 
may be turned on and off by regulation of the regulatable promoters. 

Often it is more preferable to provide cells having anywhere between 10 and 1000 
5 heterologous genes, such as ^-900 heterologous genes, for ^cample 30 to 800 
heterologous genes, such as 40 to 700 heterologous genes, for example 50 to 600 
heterologous genes, such as from 60 to 300 heterologous genes or from 100 to 400 
heterologous genes which are inserted as 2 to 4 artificial chromosomes each 
containing one concatemer of genes. The genes may advantageously be located on 
10 1 to 10 such as from 2 to 5 different concatemers in the ceils. Each concatemer may 
advantageously comprise from 10 to 200 genes, such as from 20 to 100 genes, for 
example from 30 to 60 genes, or from 50 to 100 genes. 

EXAMPLES 

15 

Example 1: Rescue of expression cassettes from EVAC clones and creation of new 
EVAC libraries 

Cells contain EVAC libraries with the following characteristics: 
20 • EVAC markers: Uracil and Tryptophan 

• cDNA libraries used for EVAC construction, e.g., 20 % carrot (root), 20% phaffla 
(whole organism), 20% caa>tenoid gene library, 20% Actinidia deliclosa (whole 
organism). 20% Cantharellus dbarius (whole organism) 

• Mixture of entry vectors used to clone the cDN/te, e.g., pEVE4, pEVES, pEVEl 3 

25 

1) Isolate total DNA (as intact as possible in order to make cassette isolation 
easier) from an overnight culture using Easy-DNA'™ Kit (Invitrogen). 

2) Digest over night 10*15 )xg chromosomal DNA with 5 units of Asa per ^g of 
DNA. The Ascl digests the EVAC into single expression cassettes and the 

30 rest of the yeast genome into on average 500 kb fragments. 

3) The digested DNA is purified on a "PCR" purification column High Pure™ 
PCR Product Purification Kit (Roche) (or a' filter which holds bac* fragments 
bigger then 10 kb) where tfie chromosomal fragments will be retained and 
primarily the expression cassettes will be recovered. 
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4) Digest pEVEl with AscI and dephosphorylate, (pEVE1 is a modified entry 
vector, with no linker region in between the Asa sites). 

5) Clone expression cassettes Into pEVEl by mixing 0.5 of purified 
expression cassettes, 200rig of entry vector {Asc\ cut and dephosphorylated) 
and 0.5 U of T4-DNA llgase (1u/|i!) (Roche) in 1 x ligation buffer (Roche), 
Ligate over night at 16*'C. 

6) The ligation mixture is then used to transfonm E.co// by electroporation using 
1 111 of 1:10 diluted ligation mixture per transformation (cunrent protocols In 
molecular biology, section 1.8.4. 

7) The synthesis of new EVACs can then be done as described earlier (See 
examples below). 

New EVACs can be made Just firom these cassettes or by mixing these with 
cassettes firom libraries not previously used. The use of cassettes just from this 
library would give a larger representation of all possible combinations of the selected 
genes. This most likely produces more variations of certain classes of molecules 
and Is thus used later, on in the evolution process when yeasts with characteristics 
close to the ideal have been identified. 

Example 2i Physical re-isolatlon and re-transfomriatton of EVACs. 

1 - The EVAC containing population is grown In 5 ml of YPD to an ODeoo > 1 .0 

2. Two 100 Hi plugs of total DHA are produced as described in BioRad's "CHEF 
genomic DNA plug kits" manual, Procure n,2 

3. EVACs are purified and Isolated: 

a. Plugs are cut and loaded into 3 slots of a pulsed field gel 

b. Run PFGE 

I. For EVACs < 1000 kb : Chef III, 1% Agarose, 1/2 strength 

TBE, 6V/cm, 14**C. 120^ angle, 50-90 sec. Switch time, 22 h mn- 
time. 

il. For EVACs > 1000 kb; Chef III; 1% Agarose. 1/2 strength 

. TBE. 6V/cm. 14X. 120* angle, 60-120 sec. Switch time. 24 h mntime 

c. stain one lane to Identify position of EVACs 

d. cut corresponding part of the two non-stained lanes and digest the 
agarose by agarase treatment following standard procedures e.g. Pulsed 
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Field Gel Electrophoresis. A practical approach.(Ed. AP. Monaco) Ox- 
ford University Press 1995. 
e. Concentrate agarased preparation to 100 (iL by ultrafiltration.(e.g. 
Microcon YM-30, Millipore) 
5 f . add 400 TE to retentate and repeat concentration step. Repeat and 

concentrate to 25 |iL 
4. EVACs are transformed into yeast as before. 

Example 3: Amplification of EVACs up to 20% of total host DMA prior to EVAC iso- 
10 lation and re-transformation 

A YAC vector containing a conditional centromere (the GAL1 promoter in front of the 
centromere) and a heterologous thymidine kinase (TK) marker can be amplified to 
constitute up to 20 % of the total DNA content in yeast cells. The centromere is in- 
15 activated by inducing transcription from a strong promoter (GAL1) towards con- 
served sequences of the centromere. The ceils are propagated in media containing 
thymidine, sulfanilamide and methotrexate which selects for cells containing multiple 
copies of the YAC. (Smith, D. R. et al., 1990, Proc. Natl. Acad. USA, Vol 87, pp. 
8242-8246). 

20 

Example 4: Sexual crosses of yeast cell populations. (Using selective media for 
diploid selection) 

Cell populations: 

25 Cell populations with 2 EVACs/cell are obtained by transfomning cells that already 
contain an EVAC with a second EVAC. 

Cell population 1 contains EVACs of fype 1 and 2 and the host cells are of mating 
. type a. 

30 

EVACs of tvoe 1: 

• Mariners: URA 3. TRP 1 , NPf" 

• cDNA libraries used to malce EVACs, e.g., carrot (root), Aloe humllis (flower). 
Narcissus pseudonarcissus (flower), Licopersicon esculentum (fruit), Olea eu- 

35 ropaea (leaves) 



P669DK00 




112 



• Mixture of entry vectors each library is cloned in, e.g., pEVE4. pEVE5. pEVE13 
& PEVE14 

EVACsoftvDe2! 
5 • Markers: LEU 2, TRP 1 . NPT^ 

• cDNA libraries e.g. phaffia (whole organism), Anubias Barteri (leaves), Acremo- 
nlum diospyrf (whole organism), Phycomyces blakesteeanus (whole organism), 
Mucor azygosporus (whole organism) 

• Mixture of entry vectors eadi library is cloned in. e.g., pEVE4, pEVES. pEVE13 
10 & pEVE14 

Cell population 2 contains EVACs of type 3 and 4 and the host cells are of mating 
typeoL 

15 EVACs of hrt)e 3: 

• Markers: URA 3. HIS 3, NPT" 

• cDNA libraries used, e.g., mouse (skin and placenta), sea urchin (whole organ- 
ism), carassius auratus, (whole organism), paracheirodon axelrodi (whole or- 
ganism), cucumaria japonica (whole organism) 

20 • Mixture of entry vectors each library is cloned in, e.g., pEVE4. pEVES. pEVE13 
& PEVE14 

EVACs of tvpe 4: 

• Markers: LEU 2, HIS 3, NPT*' 

25 • cDNA libraries used, e-g., Hierodula gransid (head), mouse (eyes), Dyscophus 
Insularis (skin), Gnathonemus petersii (head), Dfapherodes jamaicensis (head) 

• Mixture of entry vectors each library is cloned in, e.g., pEVE4, pEVE5, pEVE13 
& PEVE14 

30 Cell population 3 contains EVACs of type 1 and 2 and the host cells are a 50/50 
mixture of mating types a and cl 

• cDNA libraries used to pnsduoe EVACs, e.g., Skimmia jap. Rubella (leaves), 
Neurospora crassa (whole oiiganism), Mytilus coruscus (whole organism), Pinus 

35 pinaster (leaves), Carica papaya (fruit) 
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Cell population 4 contains EVACs of type 3 and 4 and the host cells are a 50/50 
mixture of mating types a and a. 

• cDNA libraries used to produce EVACs, e.g., Cantharellus cibarius (whole or- 
ganism), Rhizophora mangle (leaves), Fucnjs vesiculus (leaves), Halichondria 
okada! (whole organism) 

Remixing round 1: 

1) Constmctio n of Dioloid ooDulation. Mix cells from freshly grown overnight 
cultures of cell populations 1 and 2. Distribute cells in agar plates (the plates 
should allow growth of both haplold strains). Allow mating to proceed for at 
least 4 hours at 30^C, then wash cells of from plates and incubate the mating 
mixture in a liquid selective medium that will select for the diploid genotype 
which contains the 4 different types of EVACs (-URA3, -TRP1. -LEU2, - 
HIS3). 

2) Biological Screen, 

a. Induce heterologous genes 

b. The diploid cell population is screened for the relevant pharmaceuti- 
cal property(ies) and a subset of this population is selected 

c. Obtain a 10 times representation of selected population by letting it 
grow on selective medium. 

d. Divide selected population in 3 portions, 0) store, (ii) sporulate. (iii) 
keep for re-screening. at a higher selection hurdle 

3) Sporulation in llould media of selected suboopulation- 

a. Grow portion (ii) of selected diploid population to an ODeoo of 2.5 to 
3.0 (-8 X 10^ cells/ml) in selective medium. 

b. Transfer 1 ml culture to a sterile, disposable 15 ml polypropylene 
tube and centrifuge 5.mln at 1200 x g 

c. Pour off the supematant and re-suspend cells in 5 ml sterile water. 
Vortex to re-suspend cells and spin as in step 2. 

d. Pour off supernatant and re-suspend cells in 1 ml of sporulation me- 
dium 
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e. Shake for 2 to 3 days at ^350 rpm, 30^C, and examine the culture 
microscopically for spore formation 

Sporulation medium, per liter. 
lOg potassium acetate (1%) 
1g yeast extract 

0. 5 g dextrose 

4) Formation of random scores 

a. Pellet 1 ml of sporulation culture 

b. Re-suspend all the cells from a 1 ml sporulation culture in 5 ml water 

c. Add 0.5 ml of a 10000 U Zymolyase-20T solution (ICN Immunobl- 
ologicals) and 10 ^1 of 2-mercaptoethanol. (The Zymolyase will kill 
any diploid cells that did not sporulate as well as haplold cells that 
have not mated). 

d. Incubate ovemight at 30*^C with gentle shaking. 

e. Add 5 ml of 1 .5% Nonidet P-40 (NP-40). Transfer the suspension to a 
15 ml disposable tube and set 15 min. on ice. 

f. Sonicate 30 sec. at 50% to 75% full power, then set on ice 2 min. 
Repeat twice. 

g. Centrifuge spores 10 min. at 1200 x g. Aspirate or pour off super- 
natant and re-suspend In 5 ml of 1.5% NP-40. Vortex vigorously. Re- 
peat twice. 

h. Sonicate as in step 6 (with repeats). 

1. Examine the spores by phase contnast microscopy to ensure that no 
more spores remain stuck together. 

J. (if spores remain studc to each other, add 2 ml glass beads (Type I, 
Sigma) and shake 30 min. at 300 rpm In an Erienmeyer flask at 30^C. 
Let the .beads settle and remove the supernatant containing the 
spores.) 

k. Centrifuge spores 10 min. at 1200 x g. Aspirate or pour off super- 
natant and re-suspend in 5 ml of water. Vortex vigorously. Repeat 
I. Count a 10-fold dilution of the treated spores using a hemocytometer. 

5) Selection of haoloid'ceils with at least 1 EVAC: 
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a. Dilute the spores to get 10^ spores/ml in a media that contains Neo- 
mycin phosphotransferase IL (The antibiotic will kill the newly formed 
haploid cells that do not contain at least 1 EVAC). 

b. Allow spores to genminate without allowing cultures ODeoo to raise 
above 0.5 (in order to avoid new sexual crosses to occur). 

c. Centrifuge 1 0 min. at 1 200 x g and aspirate or pour off supernatant. 

The riewly formed haploid cell population consists of a 50/50 mixture of a and a 
cells which have any where from 1 to 4-5 EVACs each with most of the cells con- 
taining 2 EVACs. 

Remixing round 2: 

The newly fomied haploid population is mated with population 3. 

6) Construction of Diploid population. 

a. A mixture of the 2 populations is grown overnight. 

b. Cells are centrifuged and the supernatant is poured off. 

c. Cells are re-suspended In as small an amount of rich medium as 
possible. 

d. Distribute cells on agar plates and allow mating to proceed for at 
least 4 hours at 30^C and then wash cells of from plates, 

e. Incubate the mating mixture in a liquid selective medium that will se- 
lect for the 4 different martcers (-UIRA3. -TRP1, -LEU2. -LYS2). This 
media selects for at least 70% of all possible diploid combinations. 
Some haploid cells which contain the 4 markers will also be able to 
survive at this point but the next time Zymolyase is used, the haploid 
cells will be killed. 

7) Biological Screen. 

a. Combine the population obtained in point 6e) with portion (iii) of the 
population obtained in point 2d). 

b. Induce heterologous genes 

c. The cell population is screened for the relevant pharmaceutical prop- 
erty(ies) and a subset of these populations are selected 
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d. Obtain a 10 times representation of selected populations by letting 
them grow on selective medium. 

e. Divide selected population in 3 portions^ (i) store, (il) sporulate, (iil) 
keep for re-screening at a higher selection hurdle 

8) Repeat points 3. 4 and 5 

Remixing round 3: 

The newly formed haploid population is mated with population 4. 

9) Construction of Diploid pooufation. 

a. A mixture of the 2 populations is grown overnight. 

b. Cells are centrifuged and the supernatant is poured off. 

c. Cells are re-suspended in as small an amount of rich medium as 
possible. 

d. Distribute cells on agar plates and allow mating to proceed for at 
l^ast 4 hours at 30**C and Uien wash cells of from plates. 

e. Incubate the mating mixture in a liquid selective medium that will se- 
lect for the 4 different markers (-URA3, -TRP1, -LEU2, -LYS2). This 
media selects for at least 90% of all possible diploid combinations. 
Some haploid cells which contain the 4 markers will also be able to 
sun/ive at this point but the next time Zymoiyase is used, the haploid 
cells will be killed. 

10) BiolOQical Screen. 

a. Combine the population obtained In point 9e) with portion (Hi) of the 
population obtained in point 7e). 

b. Induce heterologous genes 

c. The cell population is screened for the relevant pharmaceutical prop- 
erty(ies) and a subset of these populations are selected 

d. Obtain a 10 times representation of selected populations by letting 
them grow on selective medium. 

e. Divide selected population in 3 portions, (i) store, (ii) sporulate; (iii) 
keep for re-screenfng at a higher selection hurdle 




11 ) Repeat points 3, 4 and 5 



Repeat till desired pharmaceutical properties have tseen obtained. 

5 

Example 5: Sexual crosses of yeast ceil populations. (Using a fluorescence acti- 
vated cell sorter (FACS) for diploid selection) 

This procedure is very similar to the one described in example 2 but instead of using 
10 selective media to select for diploid cells, this selection is done simultaneously with 
the biological screening. 



Remixing round 1: 



15 1) Construction of Dioi old Dopulation. Mix cells from freshly grown overnight 

cultures of each cell population. Distribute cells in agar plates (the plates 
should allow growth of both haploid strains). Allow mating to proceed for at 
least 4 hours at SO^'C, then wash cells of from plates and incubate the mating 
mixture In a rich medium. 

20 2) Biological Screen. 

a. Induce heterologous genes 

b. The cell population Is screened for the relevant phamnaoeutical prop- 
erty(les) and for ploidy. A subset of this population that has the re- 
quired biological property and is diploid is taken fonward. 

25 c. Obtain a 1 0 times representation of selected population. 

d. Divide selected population in 3 portions, (i) store, (ii) sporulate, (Hi) 
keep for re-screening at a higher selectkDn hurdle 
3) Repeat remaining protocol has described in example 2. 

30 Thus when using FACS for the biological screening it is better to use this method 
since it shortens each screening round by 24-36 hours. 

Example 6: Sorting of mating types after fomiation of haploid cells. 




Keeping haploid cells from mating with each ottier is very difficult and requires very 
precise controj. Thus an alternative is to sort haploid cells every time after haploid 
fbmiation. For this purpose it is possible to use antibodies (against a and a receptor) 
or a and a mating Actors (conjugated with suitable flourofors/chromofors) to label 
5 the cells of different mating types. The cells are then sorted in a FACS. (In principle 
as in chapter 5 of Row cytometry : a practical approach / edited by Michael G. Or- 
merod. 3rd ed.. 2000, Oxford university press). 

Example 7: Remixing of spores. 

10 

Spores are the most stable form of keeping sporulating species. Thus another pos- 
sibility is to mix the spores of 2 different populations and then do random spore 
separation and mating. 

15 Example 8: Combination of sexual rembcing with physical rescue of expression cas- 
settes 

Every so often in the evolution programme it Is advisable to use physical rescue of 
expression cassettes and transfomnation of new EVACs into new host ceils in order 
20 to avoid selection of phenotypes that are due to the creation of resistant mecha- 
nisms and mutations in the host cell. 

Example 9: Biological screening of haploid cells 

25 It is also possible to screen for cells in Uieir haploid state. In this procedure it is es- 
sential to have optimised for 90% plus mating efficiency in order not to loose flie 
genetic content of the haploid cells in the remixing step. 

Example 10: Preparation of EVACs (Evolvable Artificial Chromosomes) 

30 

1 . Essentially full length cDNA libraries are made. 

2. cDNA libraries are made using a pool of 4 entry vectors: pEVE4, pEVES, 
pEVES and pEVE9 in a proportion of 30:30:1:30. See Rgures 12, 13, 14, 
and 15. 
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3. Each cDNA library is normalised essentially as method 4 described in 
BonaWo, MF et al. (1996) Genome Res. 6: 791-806. 

4. Coding sequences from a non-nomnalised yeast {Saccharomyces cerevlsiae) 
cDNA library are amplified by PGR and are used as driver for subtractlve 

5 hybridization against single stranded circular DMA prepared from the 

normalized library (Bonaldo, MF et al. (1996) Genome Res. 6: 791-806). in 
order to remove household genes. Remaining single stranded circles are 
purified, converted to double stranded DMA and used to transform E.coli 
DHSa. 

10 5. EVAC (Evolvable Artificial Chromosome) containing cell populations are 

made using 10 different normalised and enriched cDNA libraries in each. 
Preparation of expression cassettes 

1. Inoculate 5 ml of LB-medium (Sigma) contalnbg 100 \iglL amplcillln with 
library Inoculum corresponding to a 10+ fold representation of library. 

15 Grow overnight 

2. make plasmid miniprep from 1.5 ml of culture (E.g. Qiaprep spin miniprep 
kit) 

3. digest plasmid w. Srf 1 

4. dephosphorylate fragments and heat inactivate phosphatase( 20 min. 
20 80°C) 

5. digest w. AscI 

run 1/10 of reaction In 1 % agarose gel to estimate amount of fragment 
Preparation of pYAC4-Asc arms 

1. Inoculate 150 ml of LB medium (Sigma) with a single colony of DH5a 
25 containing pYAC4-Ascl 

2. grow to ODeoo 1 « harvest cells and make plasmid preparation 

3. digest lOOjig pYAC4-Ascl w. BamHI and AscI 

4. dephosphorylate fragments and heat Inactivate phosphatase( 20 min, 
80X) 

30 5. purify fragments(e.g. QIaquick Gel Extraction Kit) 

6. run 1 % agarose gel to estimate amount of fragment 
EVAC Synthesis 

1. mix expression cassette ifragments with YAC-arms so that cassette/arm 
ration Is -1000/1 
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10 



5 



2. 



4. 



3. 



6. 



7. 



5. 



If needed concentrate mixture (use e.g. Microcon YM30) so fragment 
concentration > 75 ng/^L of reaction 

add 1 U T4 DMA llgase, incubate 16C, 1-3 h . Stop reaction by adding 1 
|a.of500 mM EDTA 

run pulsed field gel (CHEF 111. 1% LMP agarose, strength TBE, angle 

120, temperature 12 C, voltage 5.6V/cm, switch time ramping 5 - 25 s. 

run time 30 h) Load sample in 2 lanes. 

Stain part of the gel that contains molecular weight markers 

cut sample lanes corresponding to MW. 100 - 500 kb 

agarose gel in high NaCL agarase buffer . 1 u agarase / 100 mg gel 



8. concentrate preparation to < 20 
Example 11: EVAC transformation using electroporation 

15 100 ml of YPD Is inoculated with one yeast colony and grown to ODeoo s 1.3 to 1.5. 
The culture is harvested by centrifuging at 4000 x g and 4*'C. The cells are re- 
suspended in 16 ml sterile H2O. Add 2 ml 10 x TE buffer, pH 7.5 and swirl to mix. 
Add 2 ml 10 X lithium acetate solution (1 M, pH 7.5) and swirl to mix. Shake gently 
45 min at 30^C. Add 1.0 ml 0.5 U DTE while swirling. Shake gently 15 min at 30**C. 

20 The yeast suspension is diluted to 100 ml with sterile water. The cells are washed 
and concentrated by centrifuging at 4000 x g, resuspending the pellet in 50 ml ice- 
cold sterile water, centrifuging at 4000 x g, resuspending the pellet in 5 ml ice-cold 
sterile water, centrifuging at 4000 x g and resuspending the Ffellet in 0.1 mi ice-cofd 
sterile 1 M sorbitol. The electroporation was done using a Bio-Rad Gene Pulser. In a 

25 sterile 1.5-ml microcentrifuge tube 40 pi concentrated yeast cells were mixed with. 5 
pi 1:10 diluted EVAC preparation. The yeast-DNA mix is transfenred to an ice-cold 
0.2-cm-gap disposable electroporation cuvette and pulsed at 1.5 kV, 25 pF, 200 Q, 
1 ml ice-cold 1 M sorbitol is added to the cuvette to recover the yeast. Aliquots are 
spread on selective plates containing 1M sorbitol. Incubate at 30*^0 until colonies 

30 appear. 



Example 12: Transformation of EVACs (Evolvable Artificial Chromosomes) into 
hosts that already contain EVACs 




P669DK00 

121 

1. Grow the EVAC cell population to mid log, 2 x 10® to 2 x 10^ cells/ml in liquid 
medium, at 30^C and with aeration, under selective conditions for the EVACs. 

2. Spin to pellet cells at 400 x g for 5 minutes; discard supernatant 

3. Resuspehd cells in a total of 9 ml TE, pH 7.5. Spin to pellet cells and discard 
5 supernatant 

4. Gently resuspend cells in 5 ml 0.1 M Uthium/Cesium Acetate solution. pH 7.5. 

5. Incubate at SO^'C for 1 hour with gentle shaking. 

6. Spin at 400 x g for 5 minutes to pellet cells and discard supernatant 

7. Gently resuspend in 1 ml TE, pH 7.5. Cells are now ready for transformation. 
10 8. In a 1.5 ml tube combine: 

• 100 pi yeast cells 

• 5 pi earner DMA (1 0 mg/ml) 

• 5 pi Histamine Solution 

• 5/100 of an EVAC preparation in a 10 pi volume (max). (One EVAC 
15 preparation is made of 100 ixg of entry vector library plasmid mixture) 

9. Gentiy mix and incubate at room temperature for 30 minutes. 

10. In a separate tube, combine 0.8 ml 50% (w/v) PEG 4000 and 0.1 ml TE and 0,1 
ml of 1 M UAc for each transformation reaction. Add 1 ml of this PEGrTE/LIAc 
mix to each transfonmation reaction. Mix cells into solution with gentie pipetting. 

20 11. Incubate at 30**C for 1 hour. 

12. Heat shock at 42X for 15 minutes; cool to 30*C. 

13. Pellet cells in a microcentrifuge at high speed for 5 seconds and remove 
supematant 

14. Resuspend in 200 pi of rich media and plate in appropriate selective media 
25 1 5. Incubate at 30X for 48-1 20 hours until transfomied colonies appear. 

Example 13: Rare restriction enzymes with recognition sequence and 
cleavage points 

In Uiis example, rare restriction enzymes are Hsted togetiier with tiieir recognition 
30 sequence and cleavage points. 

W = AorT;N = A,C,G,orT 



35 



13 a) Unique, palindromic overtiang 
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10 



IS 



AscI 

AstSI 

CdNI 

CspBI 

Fsel 

MchAI 

NoU 

Pad 

Sbfl 

Sdal 

Sgfl 

SgrAI 

Sse232l 

Sse8387l 



GG'*CGCG_CC 
GCG_AT'^CGC 
GCKSGCCjGC 
GC^GCC_GC 

gg.ccgg'^cc 

gc^ggcc.gc 

gc^ggcc.gc 

tta at'^taa 

ccJtgca'ksg 

cc_tgca'k3g 

GCG_AT'^GC 

cr'k:cgg_yg 

CG'^CCGG.CG 
CC.TGCA'KBG 



20 



25 



13b) 

BstRZ246l 

BstSWI 

MspSWI 

MssI 

Pmel 

Smil 

Srfl 

Swal 



No overhang 

ATTT*AAAT 

ATTT'^AAAT 

ATTT'^AAAT 

GTTT'^AAAC 

GTTT'^AAAC 

ATTT'^AAAT 

GCCC^GGC 

ATTT*AAAT 



30 



35 



40 



45 



50 



1 3c) Non-palindromic and/or variable overhang 

Aari CACCTGCNNNN'^NNNN_ 

Abel CC^TCA.GC 

Alol ''NNNNN_NNNNNNNGAACNNNNNNTCCNNNNNNN_NNNNN'^ 

Bael '^NNNNN_NNNNNNNNNNACNNNNGTAYCNNNNNNN NNNNN* 

BbvCI . CCATCA_GC 

Cpol CG'^WC_C6 

Cspl CG*GWC_CG 

Pfl27l RG^GWC CY 

ppii ^nnnnnJinnnnnngaacnnnnnctcnnnnnnnn NNNNN'^ 

PpuMI RG'^GWC.CY 

PpuXI RG'^GWC.CY 

PspSII RG'^WC_CY 

PspPPJ RG'^GWC_CY 

Rsril CG'^GWC_CG 

Rsr2l CG'HSWC CG 

SanDI GG'^WC~CC 

Sapl GCTCTTCN'^NNN_ 

Sdil GGCCN_NNN*NGGCC 

SexAl A'KDCWGG.T 

Sfil . GGCCN_NNN^GGCC 

Sse1825l GGN3WC_CC 

Sse8647l AG'^WC CT 

VpaK32l GCTCTTCN^NNN 




12d) Meganudeases 

5 l-Sce I TAGGGATAA_CAGG'^TAAT 

l-Ceu I ACGGTC_CTAA'»GGTAG 

l-Crel AAACGTC_GTGA'K3ACAGTTT 

I-Sce II G GTC_ ACCC^TGAAGTA 

l-Sce III GTTTTGGjrAAC^TATTTAT 

10 Endo. Seel GATGCTGC.AGGC^ATAGGCTTGTTTA 

Pl-Sce I GG_GTGC'H3GAGAA 

Pl-Psp I TGGCAAACAGCTA_TTAT«GGGTATTATGGGT 

l-Ppo I CTCTC_TTAA'^GTAG 

HO • TTTCCGC_AACA*GT 

15 l-Tev I NN_NN^NNTCAGTAGATG I 1 1 HCl I GGTCTACCGTTT 

More meganudeases have been Identified, but their precise sequence of recognition 
has not been detennined, see e.g. www.meganudease.oom 



20 



Example 14; Concatemer size llmttatton experiments fuse of stoppers) 



Materials used: 

pYAC4 (Sigma. Burke et al. 1987, sdence. vol 236, p 806) was digested with EcoRI 
25 and BamHI and dephosphorylated. 

pSE420 (invitrogen) was linearised using EcoRI and used as the model fragment 
for concatenation. 

T4 DNA ligase (Amersham-pharmada biotech) was used for ligation according to 
manufacturers instrudlons. 

30 

ly/lethod: Fragments and arms were mixed in the ratios(concentratlons are arbitrary 
units) indicated on figures. Ligation was allowed to proceed for 1 h at 16C. Reaction 
was stopped by the addition of 1 jiL 500 mM EDTA. Products were analysed by 
standard agarose GE (1 % agarose. Vk ^ngth TBE) or by PFGE(CHEF III. 1% 
35 LMP agarose, Y, strength TBE. angle 120. temperature 12 C, vdtage 5.6V/cm. 
switch time ramping 5 -r 25 s, run time 30 h) 

The results are shown in Rgure 18a and 18b. 

40 Example 15: Expressi on of different patlarns "phenotvpes" obtained using the 
same veast clones under different expression conditions 



I 
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Colonies were picked with a eterlle toothpick and streaked sequentially onto plates 
coHBsponding to the four repressed and/or induced conditions (-Ura/-Trp -Ura/- 
Trp/-Met. -Ura/-Trp/+200 mM CU2SO4. -Ura/-Trp/-Met/+200 uM CU2SO4). Results are 
5 shown in Rgure 27. 
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Claims 

1. A method of mixing heterologous genes In expression cassettes located on 
artificial chromosomes said method comprising the steps of 

5 providing two initial populations of ceils that can mate with each other, 

said initial populations comprising at least 2 cells in each population, and at least 
two cells In each population having different combinations of heterologous genes 
and/or different combinatiorts of expression cassettes, 

each cell comprising at least a first type of artificial chromosome, the at least first 
10 type of artificial chromosome comprising both at least two expression cassettes 

comprising heterologous genes and at least one selectable maricer, 
the selectable markers being allocated to artificial chromosomes so that each 
type of artificial chromosome from each population can be Individually selected 
for, 

1 5 mating the cells with each other, and 

selecting mated cells that carry at least a subset of the selectable markers 
present on the artificial chromosomes in the two initial populations. 

2. The method of claim 1, further comprising causing the selected mated cells to 
20 undergo meiosis. 

3. The method according to claim 2, where meiosis is perfonned under conditions 
where ceils without artificial chromosomes and cells that have not undergone 
meiosis do not survive. 

25 

4. The method according to daim 1, wherein the subset of maricers selected for 
comprises at least one maricer from and artificial chromosome in each of the 
initial populations to ensure selection of mated cells. 

30 5. The method according to any of the preceding claims, .wherein the selection for 
a subset of the selectable maricers includes selecting at least 70 % of all diploid 
types present in the mated population. 



6. The method according to claim 5, wherein the selection of a subset of the 
. selectable maricers includes selecting at least 80 % of all diploid types present in 
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the mated population, such as at teast 90%. for example at least 95%. such as 
at least 99%. for example 100%. 

7. The method according to any of the preceding -claims, further comprising 
screening mated cells for one or more parameters related to a desired 
functionalityOes) and selectii^ cells having a predefined selection criterion(a) to 
undergo meiosis and mating. 

8. The method according to any of the preceding claims, further comprising 
screening cells that have undergone meiosis for at least one parameter related 
to a desired functionality(ies) and selecting cells having a predefined selection 
criterion(a) to undergo mating and meiosis. 

9. The method according to claims 7 or 8. .wherein the selection threshold(s) 
associated with the desired functionalty(ies) is increased for each round of 
mating and meiosis. 

10. The method according to any of the preceding claims, wherein one screening for 
more than one parameter is perfbnned for at least one round of mating and 
meiosis. 

11. The method according to any of the preceding claims, further comprising 
repeating the steps of claims 1 and 2 at least twice, such as 3 times, for 
example 4 times, such as 5 times, for example 6 times, such as 7 times, for 
example 8 times, such as 9 times, for example 10 times, such as 11 times, for 
example 12 times, siich as 13 times, for example 14 times, such as 15 Umes, for 
example 16 Umes, such as 17 times, for example 18 times, such as 19 times, for 
example 20 times, such as 25 times, for examfAe at least 30 times, such as at 
least 40 times, for example at least 50 Umes. such as at least 75 times, for 
example at least 100 times, such as at least 200 times, for example at least 300 
times, such as at least 500 times, for example at least 1000 times. 

12. The metiiod according to any of ttie preceding claims, further comprising 
• subjecting the populations of cells to physicaf isolation of artificial chromosomes 
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from the populations for every 4-5 rounds of meiosis and selection, and 
transferring the isolated artificial chromosomes into new host cells. 

13. The method according to claim 12, wherein physical Isolation comprises 
5 amplification of artificial chromosomes in the host cells. 

14. The method according to claim 12, wherein physical isolation comprises cutting 
expression cassettes from concatamers of expression cassettes on artificial 
chromosomes and re-assembling expression cassettes into an artificial 

10 chromosome vector backbone and transfomning these into new host cells. 

15. The method according to any of the preceding claims, further comprising 
separating cells of the two mating types from each other after meiosis. 

15 16. The method according to any of the preceding claims, further comprising mixing 
spores from different populations prior to mating. 

17. The method according to any of the preceding claims, further comprising storing 
a sub-population of mated and selected cells, while another sub-population 

20 undergoes further meiosis and mating. 

18. The method according to any of the preceding claims 2 to 17. further comprising 
adding a furUier population of cells witii types of artificial chromosomes 
comprising at least two expression cassettes with heterologous genes, the cells 

25 being capable of mating with the cells that have undergone mating and meiosis, 

the furtiier population comprising at least 2 cells with combinations of expression 
cassettes different from ttie combinations in the cells of the initial population, the 
artificial chromosomes of said further population carrying at least one selectable 
marker. 

30 

19. The method according to claim 18, wherein the types of artificial chromosomes 
of said further population have the same markers as the initial populations. 

20. The method according to claim 18, wherein the further population comprises a 
35 50/50 mixture of celts of Uie two mating types of the initial populations. 




21. The method according to claim 18, wherein the further population comprises 
cells of one of the mating types of the initial populations. 

5 22. The method according to any of the preceding claims 17-21, further comprising 
screening an eariier stored sub-population together with a population that has 
undergone at least one further round of meiosis and mating at a higher selection 
threshold than the previous screening, selecting cells above the higher selecb'on 
threshold, and mating the selected cells with each other. 

10 

23. The method according to any of the preceding claims, wherein at least one of 
the two initial populations of cells that can mate with each other further carry at 
least a second type of artificial chromosome with expression cassettes 
comprising heterologous genes, the first and second types of artificial 

15 chromosome carrying at least one selectable marker so that said first and 

second type of artificial chromosome can be individually selected for. 

24. The method according to daim 23. wherein at least one of the two initial 
populations of cells that can mate with each other further carry at least a third 

20 type of artificial chromosome with expression cassettes comprising heterologous 

genes, the first, second, and third types of artificial chromosome carrying at least 
one selectable marker so that said first, second, and third type of artificial 
chromosome can be individually selected for. 

25 25. The method according to daim 24, wherein at least one of the two initial 
populations of cells that can mate with each other further carry at least a fourth 
type of artificial chromosome with expression cassettes comprising heterologous 
genes, the first, second, third, and fourth type of artificial chromosome carrying 
at least one selectable mariner so that said first, second, third, and fourth type of 

30 artificial chromosome can be individually selected fpr. 

„ 26. The method according to any of the preceding daims. wherein the two initial 
populations of cells that can mate with each other cany from 1 to 10 t^es of 
arttfidal chromosomes, each type of artifidai chromosome of each population 




earring at least one selectable marker so that each of the types of artificial 
chromosomes from each of the two populations can be individually selected for. 

27. The method according to claim 18. wherein the further population of cells with 
5 artificial chromosomes capable of mating with the cells that have undergone 

mating and meiosis carry from 1 to 10 types of artificial chromosomes, each type 
of artificial chromosome of said further population carrying at least one 
selectable marker so that each of the types of artificial chromosomes can be 
indi\ndually selected for. 

10 

28. The method according to any of the preceding claims, wherein each cell cam'es 
2 artificial chromosomes per cell that can mate. 

29. The method according to any of the preceding claims, wherein each cell carries 
15 3 artificial chromosomes per cell that can mate. 

30. The method according to any of the preceding claims, wherein each artificial 
chromosome carries at least two selectable mariners, the selectable mariners 
being allocated to artificial chromosomes so that each type of artificial 

20 chromosome finom each population can be individually selected for. 

31. The method according to claim 30, wherein at least one mariner is located on the 
k>ng ann of the artificial chromosomes. 

25 32. The method according to any of the preceding claims, wherein each artificial 
chromosome comprises a common selectable marker, said selectable maricer 
preferably being an auxotrophic maricer. 

33. The method according to any of the preceding claims, wherein the markers are 
30 selected from drug resistance, colour, morphology, resistance against 

electromagnetic radiation, salt tolerance, O2 resistance, fluorochrome probes 
and auxotrophy maricers. more preferably auxotrophy markers. 

34. The method according to any of the preceding claims, wherein the mariners are 
35 sequence tags that can be detected by fluorescent probes/stains. 
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35. The method according to any of the preceding claims, wherein the markers are 
selected from the group consisting of NPT". LEU 2, TRP 1. HIS 3. LYS 2. URA 
3. ADE 2, Amyloglucosidase, ^-lactamase, CUP 1. G418^ TUN^ KILkl. C230. 
SMR1. SFA, Hygromycln^ methotrexate*^, chloramphenicol*^. Diuron", 2:eocin^ 
Canavanine^ ARG 4. THR, Luciferase, GUS. GFP, LUX. 

36. The method according to any of the preceding claims, wherein the two initial 
populations are of different mating types. 

37. The method according to any of the preceding claims, wherein the two initial 
populations have approximately the same number of cells. 

38. The method according to any of the preceding claims 1-36, wherein the number 
of cells in one population Is higher than the number of cells in the other 
popuiaQon. 

39. The method according to any of the preceding claims, wherein type of artificial 
chromosomes with the same marker or combination of markers differ with 
respect to combinations of expression cassettes comprising heterologous 
genes. 

40. The method according to any of the preceding claims, wherein the species of 
cells are eukaryotic. 

41. The method according to any of the preceding claims, wherein the species of 
cells are prokaryotlc. 

42. The method according to claim 40, wherein the species of cells are fungal cells. 

43. The method according to claim 42, wherein the fungal celis are selected from a 
spore forming species. 

44. The method according to claim 42. wherein the fungal celis are yeast cells. 




45. The method according to claim 42, wherein the yeast cells are selected from the 
group comprising comprising baker's yeast, Kluyveromyces mancianus, K: lactis, 
Candida utilis, Phaffia rhodozyma, Sacdiaromyces boulardii, Pidiia pastoris, 
Hansenula polymorpha. Yarrowia lipolytica, Candida paraffinica. 

5 Schwanniomyces casteilii, Pichta stipiUs, Candida shehatae, Rhodotorula 

glutinis, Upomyces lipofer, Cryptoooccos curvatus, Candida spp. (e.g. C. 
palmioleophila), Yarrowia lipolytica. Candida guillienmondii, Candida, 
Rhodotorula spp.. Saccharomyoopsis spp., Aureobasidium pullulans, Candida 
brumptii, Candida hydrocarisofiimarica, Tonjiopsis, Candida tropicalis, 

10 Saccharomyces cerevisiae, Rhodotorula rubra. Candida flaveri, Eremothecium 

ashbyii, Pichia spp., Kluyveromyces, Hansenula, Kloeckera, Ptchia, Pachysolen 
. spp., Schizosaccharomyces pombe (fission yeast), or Torulopsis bombicola. 

46. The method according to claim 40, wherein the species of cells are plant cells or 
15 algae cells. 

47. The method according to claim 40, wherein the species of cells are animal cells. 

48. The method according to claim 41, wherein the species of cells are bacterial 
20 cells such as Escherichia ooli. Bacillus subtilis. Streptomyces lividans, 

Streptomyces coelicolor, Pseudomonas aeruginosa, Myxococus xanthus, and 
wherein mating Is conjugation. 

49. The method according to any of the preceding claims, wherein the mated cells 
25 are diploid. 

50. The method according to any of the preceding claims, wherein the mated cells 
are tetraploid. 

30 51. The method according to any of the preceding claims, -wherein the mated cells 
are hexaploid. 

52. The method according to any of the preceding daims,' wherein the expression 
cassettes are located on a nudeotide concatemer comprising in the 5'-^3' 
35 direction a cassette of nucleotide sequence of the general formula 




(rs2-SP.PR-X-TR-SP-rSi]n 



wherein 

5 

rsi and rs2 together denote a functional restriction site, 
SP individually denotes a spacer of at least two nucleotide bases, 
PR denotes a promoter, capable of functioning in a cell, 
X denotes an expressible nucleotide sequence, 
10 TR denotes a tenninator, and 

SP individually denotes a spacer of at least two nucleotide bases, and 
n ^ 2, and 

wherein at least a first cassette is different from a second cassette. 



15 53. The method according to any of the preceding claims, comprising expressible 
nucleotide sequences from at least one expression state. 

54. The method according to any of the preceding claims, comprising nucleotide 
sequences from at least two expression states. 

20 

55. The method according to claim 52, wherein the rsi-rs2 restriction site of at least 
two cassettes are recognised by the same restriction enzyme, more preferably 
are identical. 

25 56. The method according to claim 52, wherein the rsi-rs2 restriction site of 
essentially all cassettes are recognised by the same restricbbn enzyme, more 
preferably are identical. 

57. The method according to any of the preceding claims, wherein substantially all 
30 . expression cassettes on one artificial chromosome are different. 

58. The method according to any of claims, wherein at least one expre.ssion 
cassette comprises an intron between the promoter and the expressible 
nucleotide sequence, more preferably substantially all cassettes comprise an 

35 intron between the promoter and the expressible nucleotide sequence. 




. 59. The method according to any of the preceding claims, wherein the different 
combinations of expression cassettes comprises different promoters, and/or 
different expressibie nucleotide sequences, and/or different spacers and/or 
5 different terminators and/or different introns. 

60. The method according to daim 52, wherein n is at least 10, such as at least 15, 
for example at least 20, such as at least 25, for example at least 30, such as 
from 30 to 60 or more than 60, such as at least 75, for example at least 100, 

10 such as at least 200, for example at least 500, such as at least 750, for example 

at least 1000, such as at least 1500, for example at least 2000. 

61. The method according to any of the preceding claims, wherein the artificial 
chromosome is selected from the group comprising a Yeast Artificial 

15 Chromosome, a mega Yeast Artificial Chromosome, a Bacterial Artificial 

Chromosome, a mouse artificial chromosome, a Plant Artificial Chromosome, a 
Mammalian Artifidal Chromosome, an Insect Artificial Chromosome, an A\^an 
Artificial Chromosome, a Bacteriophage Artificial Chromosome, a Baculovirus 
Artificial Chromosome, or a Human Artificial Chromosome. 

20 

62. The method according to claim 53 or 54, wherein the different expression states 
represent at least two different tissues, such as at least two organs, such as at 
least two spedes, such as at least two genera. 

25 63. The method according to daim 62, wherein the different spedes are from at 
least two different phylae, such as from at least two different dasses, such as 
from at least two different dh^isions, more preferably from at least two different 
sut>-kingdoms, such as from at least two different kingdoms. 

30 64. The method according to claim 62, wherein one species is a eukaryot and 
another species is a prokaryot. 

65. The method according to any of the preceding daims, \A*ierein the -different 
combinations of heterologous genes and/or different combinations of expression 




cassettes on artificial chromosomes are designed to minimise the level of repeat 
sequences occurring. 

66. A method of mixing heterologous genes in expression cassettes located on 
5 artificial chrornosomes said method comprising the steps of 

providing two initial populations of protoplasts or cells that can be fused, 
said Initial populations comprising at least 2 cells in each population, and at least 
two cells in each population having different combinations of heterologous genes 
and/or different combinations of expression cassettes. 
10 each cell comprising at least a first type of artificial chromosome, said at least 

first type of artifidal chromosome comprising both at least two expression 
cassettes comprising heterologous genes and at least one selectable marker, 
the selectable markers being allocated to artificial chromosomes so that each 
type of artificial chromosome from each population can be individually selected 
15 for, 

performing protoplast fusion and regeneration of ceil walls or performing fusion 
of cells, and 

selecting fused ceils that carry at least a subset of the selectable markers 
present on the artificial chromosomes in the two initial populations. 

20 

67. The mettiod according to daim 66, further comprising repeating, the steps of 
claim 66. 

68. The method according to claim 66, wherein the species of cells are selected 
25 from fungi, algae, and plants. 

69. The method according to daim 66, wherein the spedes of cells are selected 
from prokaryots. 

30 70* The method according to claim 66, wherein the species of cells are selected 
from animal cells, induding human ceils. * 

71. The method according to claim 66, wherein the species of cells are selected 
from plant, preferably carrot, Arabidopsis thaliana, Nicotiana spp., Nicotiana 
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tabacum, maize, wheat, rice, soybean, tomato, peanut, potato, sugar beets, 
sunflower, yam, rape seed, conifers, and petunia. 

72. The method according to any of the preceding daims 66 to 71, further 
5 comprising screening cells that result from protoplast fusion for a desired 

functionaity(ies) and selecting cells having the desired fijnctionalty(ies) above a 
defined threshold, isolating protoplasts from these cells and perfonming 
protoplast fusion and cell regeneration on the selected cells. 

10 73. The method according to claims 72, wherein the selection threshold(s) 
associated with the desired functionalty{ies) is increased for each round of 
protoplast isolation and fusion. 

74. The method according to any of the preceding claims 66 to 73, further 
15 comprising storing a sub-population of cells regenerated from fused protoplasts, 

• while another sub-population undergoes protoplast isolation and fusion. 

75. The method according to claim 74, further comprising screening an eariier 
stored sub-population together with a population that has undergone at least one 

20 further round of protoplast isolation and fusion at a higher selection threshold 

than the pre>rious screening, selecting cells above the higher selection tfireshold. 
and performing protoplast fusion on the selected cells. 

76. The method according to any of the preceding claims 66 to 75, wherein at least 
25 one of the two initial populations of protoplasts that can fuse with each other 

further carries at least a second type of artificial chromosome with expression 
cassettes comprising heterologous genes, the first and second type of artificial 
chromosome from each population carrying at least one selectable marker so 
that said first and second type of artificial' chromosome can be individually 
30 selected for. 

77. The method according to any of the preceding claims 66 to 76, wherein selection 
of a subset of the selectable maricers includes selection for at least 70 % of all 

. fused cell types present in the fused population. 

35 
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78. The method according to claim 77, wherein selectin of a subset of the selectable 
markers includes selecting at least 80 % of all fused cell types present in the 
fused population, such as at least 90%» for example at least 95%, such as at 
least 99%, for example 100%. 

79. The method according to any of the preceding claims 66 to 78, further 
comprising repeating the steps of claim 66 at least twrice, such as 3 times, for 
example 4 times, such as 5 times, for example 6 times, such as 7 times, for 
example 8 times, such as 9 times, for example 10 times, such as 11 times, for 
example 12 times, such as 13 tinnes, for example 14 times, such as 15 times, for 
example 16 times, such as 17 times, for example 18 times, such as 19 times, for 
example 20 times, sudi as 25 times, for example at least 30 times, such as at 
least 40 times, for example at least 50 times, such as at least 75 times, for 
example at least 100 times, such as at least 200 times, for example at least 300 
times, such as at least 500 times, for example at least 1000 times. 

80. The method according to any of the preceding dalms, further comprising 
subjecting the populations of cells to physical isolation of artificial chromosomes 
from the populations for every 2-3 rounds of melosis and selection, and 
transfenring the isolated artificial chromosomes into new host cells. 

81. The method according to any of the preceding claims 66-80, wherein the two 
initial populations of ceils carry from 1 to 10 types of artificial chromosomes, 
each type of artificial chromosome of each population carrying at least one 
selectable marker so that each of tiie types of artificial chromosomes from each 
of tiie two populations can be individually selected for. 

82. The method according to any of the preceding claims 66 to 81, further 
comprising adding a further population of cells wiUi artificial chromosomes 
comprising at least two expression cassettes with heterologous genes, the cells 
being capable of fusing v^rith the cells ttiat have undergone fusion, the furOier 
population comprising at least 2 cells vtnth combinations of expression cassettes 
different from the combinations in tiie cells of the initial population, tiie artificial 
chromosomes of sard further population carrying at least one selectable marker. 
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83. The method according to dalm 82, wherein the further population of cells with 
artificial chromosomes capable of fusing with the cells that have undergone 
mating and meiosis carry from 1 to 10 types of artificial chromosomes, each type 
of artificial din3mosome of said further population canying at least one 

5 selectable marker so that each of the types of artificial chromosomes can be 

Individually selected for. 

84. The method according to any of the claims 66-83, wherein each cell canies 2 
artificial chromosome per ceil/protoplast to be fused. 

10 

85. The method according to any of the claims 66-83, wherein each cell carries 3 
artificial chromosome per cell/protoplast to be fiis^d. ' 

86. The method according to any of the preceding claims 66-85, comprising the 
1 5 features of any of the claims 30-36, 37-39, and 52-65. 

87. A method of mixing heterologous genes in expression cassettes located on 
artificial chromosomes, said method comprising the steps of 

20 a) obtaining at least one population of cells, the cells of said at least one population 
comprising 

a concatemer of expression cassettes of the following formula: 

[rSrSP-PR-X-TR-SP-rstln 

vriierein 

25 rsi and rs2 together denote a restriction site, 

SP individually denotes a spacer, 

PR denotes a promoter, capable of functioning in the cells, 

X denotes an expressible nucleotide sequence, 

TR denotes a terminator, and 
30 n^2, 

the cells differing from each other with respect to combinations of 

expressible nucleotide sequences and/or promoters, 

b) isolating at least some of the cassettes of the selected cells by cutting the 
35 concatemers with a restriction enzyme cleaving rsirS2. 
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c) amplifying at least some of the isolated cassettes, 

d) assembling the expression cassettes of step c) into artificial chromosomes, and 

e) optionally transferring the artificial chromosomes into host cells. 

5 88. The method according to claim 87, wherein amplification of isolated cassettes 
comprises PCR with primers for tagging rsi and rsa. 

89. The method according to claim 87. wherein amplification of isolated cassettes 
comprises inserting isolated cassettes into a vector having a cloning site 

10 compatible ^nnth rsirs2 and multiplying this vector in a suitable host 

90. The method according to claim 87. further comprising adding further cassettes 
for the assembly step. 

15 91. The method according to any of the precedir^ claims 87 to 90, further 
comprising screening cells with assembled artificial chromosomes for a desired 
functionalty(ies) and selecting cells having the desired functionalty(ies). 

92. The method according to claim 91, further comprising subjecting the selected 
20 cells to further Isolation and amplification of cassettes and assembly of artificial 

chromosomes. 

93. A method for mixing heterotogous genes in expression cassettes located, on 
artificial chromosomes, said method comprisir^ the steps of 



providing two initial populations of cells. 



said initial populations comprising at least 2 cells in each population, and at least 
two cells in each population having different combinations of heterologous genes 
and/or different combinations of expression cassettes. 



30 



each cell comprising at least a first type of artificial chromosome, the at least first 
type of artificial chromosome comprising tx>th at least two expression cassettes 
comprising heterologous genes and at least one selectable marker, 



35 



the selectable marlcers being allocated to artificial chromosomes so that each 
type of artificial chromosome from each population can be individually selected 
for. 
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mating the cells with each other, 
amplifying the artificial chromosomes in the host cells, 
isolating the artificial chromosomes, 
mixing the isolated artificial chromosomes, 
5 transferring subsets of said isolated and mixed artificial chromosomes Into host 

cells, and 

selecting cells that carry at least a subset of the selectable markers present on 
the artificiial chromosomes in the two initial populations. 

10 94. The method according to claim 93, .further comprising repeating the mixing 
process at least once, more preferably at least twice, more preferably at least 
three times, such as at least four times, for example at least. 5 times, such as at 
least 6 times, for example at least 7 times, such as at least 8 times, for example 
at least 9 times, such as at least 10 times, for example at least 15 times, such as 

15 at least 20 times, for example at least 25 times. 

95. The method according to daim 93, wherein the host cells into which the subsets 
of mixed type of artificial chromosomes are transferred already contain artificial 
chromosomes witti expression cassettes with heterologous genes. 



20 



96. A method of mixing expressible nucleotide sequences, said method comprising 
the steps of 



a) obtaining at least one population of cells, the cells of said at least one population 
25 comprising at least one expression cassettes of the foilomng formula: 

Irs2-SP-PR-rsr-X-rs2'-TR-SP-rsi]n 
wherein 

rsi and rs2 together denote a restriction site, 
rsV and rs2' together denote a different restriction site, 
30 SP individually denotes an optional spacer, 

PR denotes a promoter, capable of functioning in the cells, 
X denotes an expressible nucleotide sequence, 
TR denotes a terminator, and 
n^2, 
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b) isolating at least some of the expressible nucleotide sequences of the selected 
cells by cutting the cassettes with a restriction enzyme cleaving rs1'rs2'. or by 
amplifying the sequences with primer pairs templating sequences in rsV and 
rs2\ 

c) re-inserting the expressible nucleotide sequences into other similar backt>one, 

d) re-mixing the expression cassettes, and 

e) transfening the re-expression cassettes into host cells. 

97. The method according to daim 96, wherein the isolated expressible nucleotide 
sequences are inserted into primary vectors comprising a nucleotide sequence 
cassette of the general formula in 5'-»3' direction: 

[RS1-RS2-SP-PR-CS-TR-«P-RS2*.RS11 
wherein 

RSI and RSV denote restriction sites. 

RS2 and RS2* denotes restriction sites different from RS1 and RSV, 
SP individually denotes a spacer sequence of at least two nucleotides. 
PR denotes a promoter, 
CS denotes a cloning site, 
TR denotes a terminator. 

98. The method according to claim 96 or 97, further comprising mixing artificial 
chromosomes by the steps of 

providing two initial populations of cells that can mate with each other, 
said initial populations comprising at least 2 cells in each population, and at least 
two cells in each population having . different combinations of expression 
cassettes as defined in daim 96, 

each cell comprising at least a first type of artifidat chromosome, the at least first 

type of artifidal chromosome comprising both at least two expression cassettes 

comprising heterologous geiies and at least one selectable marker, 

the selectable markers being allocated to artifidal chromosomes so that each 

type of artifidal chromosome from each population can be individually selected 

for, 

mating the ceils with each other, and * 
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selecting mated ceils that cany at least a subset of the selectable markers 
present on the artificial chromosomes in the two initial populations. 

99. A method of mixing heterologous genes In expression cassettes located on 

5 plasmids said method comprising the steps of 

providing two Initial populations of cells that can mate with each other» 
said Initial populations comprising at least 2 cells in each population, and at least 
two cells in each population having different combinations of heterologous genes 
and/or different combinations of expression cassettes, 

10 each cell comprising at least a first piasmid, the at least first plasmid comprising 

lK>th at least two expression cassettes comprising heterologous genes and at 
least one selectable marker, 

the selectable markers being allocated to plasmids so that each type of plasmid 
from each population can be individually selected for, 
1 5 mating the cells with each other, and 

selecting mated cells that carry at least a subset of the selectable markers 
present on the plasmids in the two initial populations. 

100. The method according to claim 99. v^^erein the expression cassettes 

20 are located on a nucleotide ooncatemer comprising in the 5*->3* direction a 

cassette of nucleotide sequence of the general formula 

(rs2-SP-PR-X-TR-SP-rsil„ 

25 wherein 

rsi and rsa together denote a functional restriction site, 
SP individually denotes a spacer of at least two nucleotide bases, 
PR denotes a promoter, capable of functioning in a cell, 
30 X denotes an expressible nucleotide sequence, 

TR denotes a terminator, and 

SP individually denotes a spacer of at least two nucleotide bases, and 
n ^ 2, and 

wherein at least a first cassette is different from a second cassette. 

35 




101. A method for mixing of expression cassettes comprising heterologous 

genes located on artificial chromosomes comprising using any of the methods 
according to any of the preceding claims in any sequential order. 
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<120> Methods of mixing large numbers of heterologous genes 

<130> P 669 DKOO 

<160> 5 

<170> Patentin version 3.1 

<210> 1 

<211> 3417 

<212> DMA 

<213> Artificial sequence 
<220> 

<223> Synthetic 
<220> 

<221> misc^feature 

<222> (1902) . , (2759) 

<223> Ampicillin resistance gene 



<220> 

<221> rep^origin 

<222> (959) - .(1899) 

<223> ColEl 



<220> 

<221S> misc^feature 

<222> (2891) (3347) 

<223> fl-phage origin of replication 



<220> 

<221> terminator 

<222> (495) (823) 

<223> ADHl 



<220> 

<221> promoter 

<222> (49).. (437) 

<223> Met25 promoter 

<400> 1 

ctgatttgcc cgggcagttc aggctcatca ggcgcgccat gcagggattc ttcggatgca 60 

agggttcgaa tcccttagct ctcattattt tttgcttttt ctcttgaggt cacatgatcg 120 

caaaatggca aatggcacgt gaagctgtcg atattgggga actgtggtgg ttggcaaatg 180. 
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actaattaag ttagtcaagg cgccatcctc atgaaaactg tgtaacataa taaccgaagt 240 

gticgaaaagg tggcaccttg tccaattgaa cacgctcgat gaaaaaaata agatatatat 300 

aaggttaagt aaagcgtctg ttagaaagga agtttttcct ttttcttgct ctcttgtctt 360 

ttcatctact at t beet teg tgtaatacag ggtegtcaga tacatagata eaattctatt 420 

acccccatcc ataeaagctt ggcgccgaat tcgtcgaccc ggggatccgc ggccgcaggc 480 

ctaaattgat ctagagcttt ggacttcttc gccagaggtt tggtcaagtc tccaatcaag 540 

gttgteggct tgtetacctt gccagaaatt tacgaaaaga tggaaaaggg tcaaatcgtt 600 

ggtagatacg ttgttgaeae ttetaaataa gcgaatttet tatgatttat gatttttatt 660 

attaaataag ttataaaaaa aataagtgta tacaaatttt aaagtgactc ttaggtttta 720 

aaacgaaaat tettgttctt gagtaactet ttcctgtagg teaggttget tteteaggta 780 

tagcatgagg tcgctcttat tgaccacacc tctaccggca tgcccatggg ttaactgatc 840 

aatgeatcct gcatggcgcg cctgatgagc etgaactgcc cgggeaaatc agetggacgt 900 

ctgcctgcat taatgaatcg gccaacgcgc ggggagaggc ggtttgcgta ttgggcgctc 960 

ttccgcttec tcgctcaetg aetcgctgcg etcggtegtt cggctgcggc gagcggtate 1020 

agctcactca aaggeggtaa tacggttatc cacagaatca ggggataaeg caggaaagaa 1080 

catgtgagea aaaggccage aaaaggccag gaaecgtaaa aaggeegcgt tgctggcgtt 1140 

tttccatagg ctccgccccc ctgaegagca tcacaaaaat cgacgcteaa gtcagaggtg 1200 

gegaaaeeeg acaggactat aaagataeca ggcgtttccc cctggaaget ecctegtgeg 1260 

ctctcctgtt eegacectgc cgcttaccgg atacctgtcc gcctttctec cttcgggaag 1320 

cgtggcgett tctcatagct cacgetgtag gtatctcagt tcggtgtagg tegttegetc 1380 

caagctgggc tgtgtgcacg aaccccccgt tcagcccgac cgctgcgcct tatccggtaa 1440 

ctatcgtctt gagtccaacc cggtaagaca cgacttatcg ccactggcag cagccactgg 1500 

taacaggatt agcagagcga ggtatgtagg eggtgctaca gagttcttga agtggtggce 1560 

taactacggc tacaetagaa ggaeagtatt tggtatctgc getctgctga agccagttac 1620 

cttcggaaaa agagttggta gctcttgatc cggcaaacaa acca:ccgctg gtagcggtgg 1680 

tttttttgtt tgeaagcagc agattacgcg cagaaaaaaa ggatctcaag aagatcettt 1740 

gatettttet acggggtctg aegctcagtg gaaegaaaac teacgttaag ggattttggt 1800 

catgagatta tcaaaaagga tcttcaecta gatcetttta aattaaaaat gaagttttaa 1860 

ateaatctaa agtatatatg agtaaacttg gtetgacagt taccaatgct taatcagtga 1920 
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ggcacGtatc 


tcagcgatct gtctatttcg ttcatccata gttgcctgac 


bccccgbcgb 


1980 


gtagataact 


acgatacggg agggcttacc 


atctggcccc 


agbgcbgcaa 


bgabaccgcg 


2040 


agacccacgc 


tcaccggctc cagatttatc 


agcaataaac 


cagccagccg 


gaagggccga 


2100 


gcgcagaagt 


ggtcctgcaa ctttatccgc 


ctccabccag 


bcbabbaabb 


gbbgccggga 


2160 


agctagagta 


agtagttcgc cagttaatag 


bttgcgcaac 


gbbgbbgcca 


bbgcbacagg 


2220 


catcgtggtg 


tcacgctcgt cgtttggtat 


ggcttcattc 


agcbccggbb 


cccaacgabc 


2280 


aaggcgagbt 


acatgatccc ccatgttgtg caaaaaagcg ^ttagctcct 


bcggbccbcc 


2340 


gatcgttgtc 


agaagtaagt bggccgcagt 


gtbabcactc abggbbabgg 


cagcacbgca 


2400 


taattctctt 


actgtcatgc catccgtaag atgcttttcb gtgactggtg agtactcaac 


2460 


caagtcattc 


tgagaata^t gtatgcggcg 


accgagbtgc 


bcbbgcccgg 


cgbcaabacg 


2520 


ggataatacc gcgccacata gcagaacttt aaaagtgctc atcattggaa aacgttcttc 


2580 


ggggcgaaaa 


ctctcaagga tcttaccgct 


gbbgagatcc 


agbbcgabgb 


aacccacbcg 


2640 


tgcacccaac 


tgatcttcag catcttttac 


bbbcaccagc gbbbcbgggb 


gagcaaaaac 


2700 


aggaaggcaa 


aatgccgcaa aaaagggaat 


aagggcgaca cggaaabgbb 


gaabacbcab 


2760 


actcttcctt 


tttcaatatt attgaagcat 


bbabcagggb 


babbgbcbca 


bgagcggaba 


2820 


catatttgaa tgtatttaga aaaataaaca aataggggtt ccgcgcacat 


bbccccgaaa 


2880 


agtgccacct 


gacgcgccct gtagcggcgc 


abbaagcgcg 


gcgggbgbgg 


bggbbacgcg 


2940 


cagcgtgacc 


gctacacttg ccagcgccct 


agcgcccgcb 


ccbbbcgcbb 


bctbcccttc 


3000 


ctttctcgcc 


acgttcgccg gctttccccg 


bcaagcbcba 


aabcgggggc 


bcccbbtagg 


3060 


gttccgattt 


agtgctttac ggcacctcga 


ccccaaaaaa 


cbbgabbagg gbgabggbbc 


3120 


acgtagtggg 


ccatcgccct gatagacggt 


bbbbcgcccb 


bbgacgbbgg agbccacgbb 


3180 


cttbaatagt 


ggactcttgt tccaaactgg 


aacaacacbc 


aacccbabcb 


cggbcbabbc 


3240 


ttttgattta 


taagggattt tgccgatttc 


ggccbabbgg 


bbaaaaaabg 


agcbgabbba 


3300 


acaaaaattt 


aacgcgaatt ttaaceiaaat 


abbaacgcbb 


acaabbbcca 


bbcgccabbc 


3360 


-aggctgcgca 


actgttggga agggcgatcg gtgcgggcct 


cbbcgcbabb 


acgccag 


3417 



<210> 2 

<211> 3501 

<212> DNA 

<213> Arbificial sequence 



<220> 
<223> 



Synbhebic 




<220> 

<221> misc^featuxe 

<222> (1986) . . (2843) 

<223> Axnpicillin resistance gene 



<220> 

<221> rep_origin • 

<222> (1043) . . (1983) 

<223> ColEl 



<220> 

<221> misc.feature 

<222> (2975) . . (3431) 

<223> fl-phage origin of replication 



<220> 

<221> terminator 

<222> (579) . . (907) 

<223> ADHl 



<220> 

<221> promoter 

<222> (49).. (519) 

<223> Cupl promoter 



<400> 2 
ctgatttgcc 


cgggcagttc 


aggctcatca 


ggcgcgccat gcagggataa 


gccgatccca 


60 


ttaccgacat 


ttgggcgcta 


tacgtgcata 


tgttcatgta tgtatctgta 


tttaaaacac 


120 


ttttgtatta 


tttttcctca 


tatatgtgta 


taggtttata cggatgattt 


aattattact 


180 


tcaccaccct 


ttatttcagg 


ctgatatctt 


agccttgtta ctagttagaa 


aaagacattt 


240 


ttgctgtcag 


tcactgtcaa 


gagattcttt 


tgctggcatt tcttctagaa 


gcaaaaagag 


300 


cgatgcgtct 


tttccgctga 


accgttccag 


caaaaaagac taccaacgca 


atatggattg 


360 


tcagaatcat 


ataaaagaga 


agcaaataac 


tccttgtctt gtatcaattg cattataata 


420 


tcttcttgtt 


agtgcaatat 


catatagaag 


tcatcgaaat agatattaag 


aaaaacaaac 


480 


tgtacaatca 


atcaatcaat 


catcacataa 


aatgttcaaa gcttggcgcc 


gaattcgtcg 


540 


acccggggat 


ccgcggccgc 


aggcctaaat 


tgatctagag ctttggactt 


cttcgccaga 


600 


ggtttggtca 


agtctccaat 


caaggttgtc ggcttgtcta ccttgccaga aatttacgaa 


660 


aagatggaaa 


agggtcaaat 


cgttggtaga 


tacgttgttg acacttctaa 


ataagcgaat 


720 


ttcttatgat 


ttatgatttt 


tattattaaa 


taagttataa aaaaaataag 


tgtatacaaa 


780 


ttttaaagtg 


actcttaggt 


tttaaaacga 


aaattcttgt tcttgagtaa ctctttcctg 


840 
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taggtcaggt 


tgctttctca 


ggtatagcat 


gaggtcgctc 


ttattgacca 


cacctctacc 


900 


ggcatgccca 


tgggttaact 


gatcaatgca 


tcctgcatgg 


cgcgcctgat 


gagcctgaac 


960 


tgcccgggca 


aatcagctgg acgtctgcct 


gcattaatga 


atcggccaac 


gcgcggggag 


1020 


aggcggtttg 


cgtattgggc 


gctcttccgc 


ttcctcgctc 


actgactcgc 


tgcgctcggt 


1080 


cgttcggctg 


cggcgagcgg 


tatcagctca 


ctcaaaggcg 


gtaatacggt 


tatccacaga 


1140 


atcaggggat 


aacgcaggaa 


agaacatgtg 


agcaaaaggc 


cagcaaaagg 


ccaggaaccg 


1200 


taaaaaggcc 


gcgttgctgg cgtttttcca 


taggctccgc 


ccccctgacg agcatcacaa 


1260 


aaatcgacgc 


tcaagtcaga ggtggcgaaa 


cccgacagga 


ctataaagan 


accaggcgtt 


1320 


tccccctgga 


agctccctcg 


tgcgctctcc 


tgttccgacc 


ctgccgctta 


ccggatacct 


1380 


gtccgccttt 


ctcccttcgg 


gaagcgtggc 


gctttctcat 


agctcacgct 


gtaggtatct 


1440 


cagttcggtg 


taggtcgttc gctccaagct 


gggctgtgtg 


cacgaacccc 


ccgttcagcc 


1500 


cgaccgctgc 


gccttatccg gtaactatcg 


tcttgagtcc 


aacccggtaa 


gacacgactt 


1560 


atcgccactg 


gcagcagcca ctggtaacag 


gattagcaga 


gcgaggtatg tiaggcggtgc 


1620 


tacagagttc 


ttgaagtggt ggcctaacba 


cggctacacb 


agaaggacag 


tatttggtat 


1680 


ctgcgctctg 


ctgaagccag 


ttaccttcgg 


aaaaagagtt 


ggtagctctt 


gatccggcaa 


1740 


acaaaccacc 


gctggtagcg 


gtggtttttt 


tgtttgcaag 


cagcagatta 


cgcgcagaaa 


1800 


aaaaggatct 


caagaagatc 


ctttgatctt 


tkctacgggg 


tctgacgctc 


agtggaacga 


1860 


aaactcacgt 


taagggattt 


tggtcatgag 


attatcaaaa 


aggatcttca 


cctagatcct 


1920 


tttaaattaa 


aaatgaagtt 


ttaaatcaat 


ctaaagtata 


tatgagtaaa 


cttggtctga 


1980 


cagttaccaa 


tgcttiaatca 


gtgaggcacc 


tatctcagcg 


atctgtctat 


ttcgttcabc 


2040 


catagttgcc 


tgactccccg 


tcgtgtagat 


aactacgata 


cgggagggct 


taccatctgg 


2100 


ccccagtgct 


gcaatgatac 


cgcgagaccc 


acgctcaccg 


gctccagatt 


tatcagcaat 


2160 


aaaccagcca 


gccggaaggg 


ccgagcgcag 


aagtggtcct 


gcaactttat 


ccgcctccat 


2220 


ccagtctatt 


aattgttgcc 


gggaagctag 


agtaagtagt 


tcgccagtta atagtttgcg 


2280 


caacgttgtt 


gccattgcta 


caggcatcgt 


ggtgtcacgc 


tcgtcgtttg 


gtatggcttc 


2340 


attcagctcc 


ggttcccaac 


gatcaaggcg 


agttacatga 


tcccccatgt ■ 


' tgtgcaaaaa 


2400 


agcggttagc 


tccttcggtc 


ctccgatcgt 


tgtcagaagt 


aagttggccg cagtgttatc 


2460 


actcatggtt 


atggcagcac 


tgcataattc 


tcttactgtc 


atgccatccg 


taagatgctt 


2520 


ttctgtgact 


99tgagtact 


caaccaagtc 


attctgagu 


tagtgtatgc 


ggcgaccgag 


2580 


ttgctcttgc 


ccggcgtcaa 


tacgggataa 


taccgcgcca 


catagcagaa 


ctttaaaagt 


2640 
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gctcatcatt 


ggaaaacgtt 


cttcggggcg 


aaaactctca 


aggatcttac 


cgctgt^gag 


2700 


atccagttcg 


atgtaaccca 


ctcgtgcacc 


caactgatct 


tcagcatctt 


ttactttcac 


2760 


cagcgtttct 


gggtgagcaa 


aaacaggaag 


gcaaaatgcc 


gcaaaaaagg gaataagdgc 


2820 


gacacggaaa 


tgttgaatac 


tcatactctt 


ccbttttcaa 


tattattgaa gcatttatca 


2880 


gggttattgt 


ctcatgagcg 


gatacatatt 


tgaatgtatt 


tagaaaaata 


aacaaatagg 


2940 


ggttccgcgc 


acatttcccc 


gaaaagtgcc 


acctgacgcg 


ccctgtagcg 


gcgcattaag 


3000 


cgcggcgggt 


gtggtggtta 


cgcgcagcgt 


gaccgctaca 


cttgccagcg 


ccctagcgcc 


3060 


cgctcctttc 


gctttcttcc 


cttcctttct 


cgccacgt'tc 


gccggctttc 


cccgtcaagc 


3120 


tctaaatcgg 


gggctccctt 


tagggtbccg 


acttagbgct 


ttacggcacc 


tcgaccccaa 


3180 


aaaacttgat 


tagggtgatg 


gttcacgtag 


tgggccatcg 


ccctgataga 


cggtttttcg 


.3^411 


ccctttgacg 


ttggagtcca 


cgttctttaa 


tagtggactc 


ttgttccaaa 


ctggaacaac 


3300 • 


actcaaccct 


abctcggtct 


attcttttga 


tttataaggg attttgccga 


tttcggccta 


3360 


ttggttaaaa 


aatgagctga 


tttaacaaaa 


atttaacgcg 


aattttaaca 


aaatattaac 


3420 


gcttacaatt 


tccattcgcc 


attcaggctg 


cgcaactgtt 


gggaagggcg atcggtgcgg 


3480 


gcctcttcgc 


tattacgcca 


g 








3501 



<210> 3 

<211> 4188 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Synthetic 
<220> 

<221> misc^fea ture 

<222> (2673) . . (3530) 

<223> Ainpicillin resistance gene 

<220> 

<221> rep_origin 

<222> (1730) . . (2670) 

<223> ColBl 



<220> 
<221> 
<222> 
<223> 



mi sc_ feature 
(3662) . . (4118) 

fl-phage origin of replication 



<220> 
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<221> terminator 
<222> (1027) . . (1355) 
<223> ADHl 



<220> 

<221> promoter 

<222> (582) . . (969) 

<223> Met25 promoter 



<220> 

<221> misc_f eatxire 

<222> (1365) (1603) 

<223> ARSl (autonomous replicating sequence) for Yeast replication 
<220> 

<221> misc_£eature 

<222> (49). -{574) 

<223> lambda spacer DNA (22428-22923) 



<400> 3 

ctgatttgcc cgggcagttc 


aggctcatca 


ggcgcgccat 


gcagggattc 


tggaaattgc 


60 


aacgaaggaa gaaacctcgt 


tgctggaagc 


ctggaagaag 


tatcgggtgt 


tgctgaaccg 


120 


tgttgataca tcaactgcac 


ctgatattga 


gtggcctgct 


gtccctgtta 


tggagtaatc 


180 


gttttgtgat atgccgcaga 


aacgttgtat 


gaaataacgt 


tctgcggtta 


gttagtatat 


240 


tgtaaagctg agtattggtt 


tatttggcga 


ttattatctt 


caggagaata 


atggaagttc 


300 


tatgactcaa ttgttcatag 


tgtttacatc 


accgccaatt 


gcttttaaga 


ctgaacgcat 


360 


gaaatatggt ttttcgtcat 


gttttgagtc 


tgctgttgat 


atttctaaag 


tcggtttttt 


420 


ttcttcgttt tctctaacta 


ttttccatga 


aatacatttt 


tgattattat 


ttgaatcaat 


480 


tccaattacc tgaagtcttt 


catctataat 


tggcattgta 


tgtattggtt 


tattggagta 


540 


gatgcttgct. tttctgagcc 


ategctctga 


tatcagat.ct 


tcttcggatg 


caagggttcg. 


600 


aatcccttag etc teat tat 


tttttgcttt 


ttctcttgag 


gtcacatgat 


cgcaaaatgg 


660 


caaatggcac gtgaagctgt 


cgatattggg 


gaactgtggt 


ggttggcaaa 


tgactaatta 


720 


agttagtcaa ggcgccatcc 


tcatgaaaac 


tgtgtaacat 


aataaccgaa 


gtgtcgaaaa 


780 


ggtggcacct tgtccaattg 


aacacgctcg 


atgaaaaaaa 


taagatatat 


ataaggttaa 


840 


gtaaagcgtc tgttagaaag 


gaagtttttc 


ctttttcttg 


ctctcttgtc 


ttttcatcta 


900 


ctatttcctt cgtgtaatac 


agggtcgtca 


gatacataga 


tacaattcta 


ttacccccat 


960 


ccatacaagc ttggcgccga 


attcgtcgac 


ccggggatcc 


gcggccgcag 


gcctaaattg 


1020 


atctagagct ttggacttct 


tcgccagagg 


tttggtcaag 


tctccaatca 


aggttgtcgg 


1080 
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cttgtctacc 


ttgccagaaa 


tttacgaaaa 


gatggaaaag 


ggtcaaatcg 


ttggtagata 


1140 


cgttgttgac 


acttctaaat 


aagcgaattt 


cttatgattt 


atgattttta 


ttattaaata 


1200 


agttataaaa 


aaaataagtg 


tatacaaatt 


ttaaagtgac 


tcttaggttt 


taaaacgaaa 


1260 


attcttgttc 


ttgagtaact 


ctttcctgta 


ggtcaggttg 


ctttctcagg 


tatagcatga 


1320 


ggtcgctctt 


attgaccaca 


cctctaccgg 


catgcccatg 


ggttcttttg 


aaaagcaagc 


1380 


ataaaagatc 


taaacataaa 


atctgtaaaa 


taacaagatg 


taaagataat 


gctaaatcat 


1440 


ttggcttttt 


gattgattgt 


acaggaaaat 


atacatcgca 


g9999ttgac 


ttttaccatt 


1500 


tcaccgcaat 


ggaatcaaac 


ttgttgaaga 


gaatgttcac 


aggcgcatac 


gctacaatga 


1560 


cccgattctt 


gctagccttt 


tctcggtctt 


gcaaacaacc 


gccaactgat 


caatgcatcc 


1620 


tgcatggcgc 


gcctgatgag 


cctgaactgc 


ccgggcaaat 


cagctggacg 


tctgcctgca 


1680 


ttaafcgaatc 


ggccaacgcg 


cggggagagg 


cggtttgcgt 


attgggcgct 


cttccgcttc 


1740 


ctcgctcact 


gactcgctgc 


gctcggtcgt 


tcggctgcgg 


cgagcggtat 


cagctcactc 


1800 


aaaggcggta 


atacggttat 


ccacagaatc 


aggggataac 


gcaggaaaga 


acatgtgagc 


1860 


aaaaggccag 


caaaaggcca 


ggaaccgtaa 


aaaggccgcg 


ttgctggcgt 


ttttccatag 


1920 


gctccgcccc 


cctgacgagc 


atcacaaaaa 


tcgacgctca 


agtcagaggt 


ggcgaaaccc 


1980 


gacaggacta 


taaagatacc 


aggcgtttcc 


ccctggaagc 


tccctcgtgc 


gctctcctgt 


2040 


tccgaccctg 


ccgcttaccg 


gatacctgtc 


cgcctttctc 


ccttcgggaa 


gcgtggcgct 


2100 


ttctcatagc 


tcacgctgta 


ggtatctcag 


ttcggtgtag 


gtcgttcgct 


ccaagctggg 


2160 


ctgtgtgcaic 


gaaccccccg 


ttcagcccga 


ccgctgcgcc 


ttatccggta 


actatcgtct 


2220 


tgagtccaac 


ccggtaagac 


acgacbtatc 


gccactggca 


gcagccactg 


gtaacaggat 


2280 


tagcagagcg 


aggtatgtag 


gcggtgctac 


agagttcttg 


aagtggtggc 


ctaactacgg . 


2340 


ctacactaga 


aggacagtat 


ttggtatctg 


cgctctgctg 


aagccagtta 


ccttcggaaa 


2400 


aagagttggt 


agctcttgat 


ccggcaaaca 


aaccaccgct 


ggtagcggtg 


gtttttttgt 


2460 


ttgcaagcag 


cagattacgc 


gcagaaaaaa 


aggatctcaa 


gaagabcc t: b 


baatctttt^c 

wjjf B WWW 




tacggggtct 


gacgctcagt 


ggaacgaaaa 


ctcacgttaa 


^srgattttgg 


tcatgagatt 


2580 


atcaaaaagg 


atcttcacct 


agatcctttt 


aaattaaaaa 


tgaagtttta 


aatcaatcta 


2640 


aagtatLatat 


gagtaaactt 


ggtctgacag 


-ttaccaatgc 


ttaatcagtg 


aggcacctat 


2700 


ctcagcgatc 


tgtctatttc 


gttcatccat 


agttgcctga 


ctccccgtcg 


tgtagataac 


2760 


tacgatacgg 


gagggcttac 


catctggccc 


cagtgctgca 


atgataccgc 


gagacccacg 


2820 
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ctcaccggct 



ccagatttat 



cagcaataaa 



ccagccagcc 



ggaagggccg 



agcgcagaag 



2880 



tggtcctgca 


actttatccg 


cctccatcca. 


gtctattaat 


tgttgccggg 


aagctagagt 


2940 


aagtagttcg 


ccagttaata 


gtttgcgcaa 


cgttgttgcc 


attgctacag 


gcatcgtggt 


3000 


gtcacgctcg 


tcgtttggta 


tggcttcatt 


cagctccggt 


tcccaacgat 


caaggcgagt 


3060 


tacatgatcc 


cccatgttgt 


gcaaaaaagc 


ggttagctcc 


ttcggtcctc 


cgatcgttgt 


3120 


cagaagtaag 


ttggccgcag 


tgttatcact 


catggttatg 


gcagcactgc 


ataattctct 


3180 


tactgtcatg 


ccatccgtaa 


gatgcttttc 


tgtgactggb 


gagtactcaa 


ccaagtcatt 


3240 


ctgagaatag 


tgtatgcggc 


gaccgagttg 


ctcttgcGcg 


gcgtcaatac 


gggataatac 


3300 


cgcgccacat 


agcagaactt 


taaaagtgct 


catcattgga 


aaacgttctt 


cggggcgaaa 


3360 


actctcaagg 


atcttaccgc 


tgttgagatc 


cagttcgatg 


taacccactc 


gtgcacccaa 


3420 


ctgatcttca 


gcatctttta 


ctttcaccag 


cgtttctggg 


tgagcaaaaa 


caggaaggca 


3480 


aaatgccgca 


aaaaagggaa 


taagggcgac 


acggaaatgt 


tgaatactca 


tactcttcct 


3540 


ttttcaatat 


tattgaagca 


tttatcaggg 


ttattgtctc 


atgagcggat 


acatatttga 


3600 


atgtanttag 


aaaaataaac 


aaataggggt 


tccgcgcaca 


tttccccgaa 


aagtgccacc 


3660 


tgacgcgccc 


tgtagcggcg 


cattaagcgc 


^gcgggtgtg 


gtggttacgc 


gcagcgtgac 


3720 


cgctacactt 


gccagcgccc 


tagcgcccgc 


tcctttcgct 


ttcttccctt 


cctttctcgc 


3780 


cacgttcgcc 


ggctttcccc 


gtcaagctct 


aaatcggggg 


ctccctttag 


ggttccgatt 


3840 


tagtgcttta 


cggcacctcg 


accccaaaaa 


acttgatitag 


ggtgatggtt 


cacgtagtgg 


3900 


gccatcgccc 


tgatagacgg 


tttttcgccc 


tttgacgttg 


gagtccacgt 


tctbtaatag 


3960 


tggactcttg 


ttccaaactg 


gaacaacact 


caaccctatc 


tcggtctatt 


cttttgattt 


4020 


ataagggatt 


ttgccgattt 


cggcctattg 


gttaaaaaat 


gagctgattt 


aacaaaaatt 


4080 


taacgcgaat 


tttaacaacta 


tattaacgct 


tacaatttcc 


k'ttcgccatt caggctgcgc 


4140 


aactgttggg 


aagggcgatc 


ggtgcgggcc 


tcttcgctat 


tacgccag 




4188 



<210> 4 

<211> 11466 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Synthetic 
<220> 

<221> misc_feature 

<222> (3560) . . (4247) 

<223> Tetrahymena thermophila macronuclear telomere 
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<220> 
<221> 
<222> 
<223> 



misc^feature 
(6024) (6711) 

Tetrahymena thermophila macronuclear telomere 



<220> 
<221> 
<222> 
<223> 



misc^feabure 
(9644) . . (10388) 

Autonomous replicating se<xiience 



<220> 
<221> 
<222> 
<223> 



miscjeature 
(10488) (11465) 
Centromere XV 



<220> 
<221> 
<222> 
<223> 



rep_origin 
(7198) • . (7198). 

Origin of replication, PMBl 



<220> 
<221> 
<222> 
<223> 



misc_feature 
(1962) (2765) 

URA3, orotidine-5' -phosphate decarboxylase coding sequence 



<220> 

<221> misc_£eature 
<222> (4893) . . (5552) 



<223> HIS3, imidazoleglycerolphosphate dehydratase, coding sequence 



<220> 

<221> misc_feature 
<222> (7956) . • (8816) 



<223> AP(R), beta- lactamase, wpR ampic ill in resistance, coding sequenc 



<220> 

<221> misc_feature 

<222> (9129) . . (9803) 

<223> TRPl, phosphoribosylanthranilate isomerase, coding sequence 



<400> 4 

ttctcatgtt tgacagctta tcatcgataa' gctttaatgc ggtagtttat cacagttaaa 60 

ttgctaacgc agtcaggcac cgtgtatgaa atctaacaat gcgctcatcg tcatcctcgg 120 

caccgtcacc ctggatgctg taggcatagg cttggttatg ccggtactgc cgggcctctt 180 

scgggatatc'gtccattccg acagcatcgc cagtcactat ggcgtgctgc tagcgctata 240 
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tgcgttgatg 


caatttctat 


gcgcacccgt 


tetcggagca 


ctgtccgacc gctttggccg 


300 


ccgcccagtc 


ctgctcgctt 


egctaettgg 


agccactatc 


gaetacgega tcatggcgac 


360 


cacacccgtc 


ctgtggatca 


attcccttta 


gtataaattt 


cactctgaac catcttggaa 


420 


ggaccggtaa 


ttatttcaaa 


tctctttttc 


aattgtatab 


gtgttatgtt atgtagtata 


460 


ctctttcttc 


aacaattaaa 


tactctcggt 


agccaagttg 


gtttaaggcg caagacttta 


540 


atttatcact 


acggaattgg 


cgcgccaatt 


ccgtaatctt 


gagatcgggc gttcgatcgc 


600 


cccgggagat 


ttttttgttt 


tttatgtctt 


ccattcactt 


cccagacttg eaagttgaaa 


660 


tatttctttc 


aagggaattg 


atectctacg 


ccggacgcat 


cgtggccggc atcaccggeg 


720 


ccacaggtgc 


ggttgctggc 


geetatatcg 


ccgacatcac 


cgatggggaa gatcgggctc 


780 


gccacttcgg 


gc teat gage 


gcttgtttcg 


gcgtgggtat 


ggtggcaggc eccgtggccg 


840 


ggggactgtt 


gggcgccate 


tccttgcatg 


caccattcct 


tgcggeggcg gtgctcaaeg 


900 


gcctcaacct 


actactgggc 


tgcttcctaa 


tgcaggagtc 


gcataaggga gagcgtcgac 


960 


cgatgccctt 


gagagccttc 


aacccagtca 


gctccttccg 


gtgggcgcgg ggcatgacta 


1020 


tcgtcgccgc 


acttatgact 


gtcttcttta 


tcatgcaact 


cgtaggacag gtgccggcag 


1080 


cgctctgggt 


cattttcggc 


gaggaccget 


ttcgctggag 


cgcgacgatg atcggcctgt 


1140 


cgcttgcggt 


attcggaatc 


ttgcacgecc 


tcgctcaagc 


cttcgtcact ggteccgcca 


1200 


ccaaacgttt 


cggcgagaag 


caggccatta 


tcgccggcat 


ggfcggccgac gcgctgggct 


1260 


acgtcttgct 


ggcgttcgcg 


aegegaggct 


ggatggcctt 


ccccattatg attcttctcg 


1320 


cttccggcgg 


catcgggatg 


cccgcgttgc 


aggccatgct 


gtecaggeag gtagatgacg 


1380 


accatcaggg 


acagcttcaa 


ggatcgctcg 


cggctcttac 


cagccbaact tcgatcactg 


1440 


gaccgctgat 


cgtcacggcg 


atttatgceg 


cctcggcgag 


cacatggaac gggttggcat 


1500 


ggattgtagg 


cgccgcccta 


taccttgtct 


gcctccccgc 


gttgcgtcgc ggtgcatgga 


1560 


gccgggccac 


ctcgacctga* 


atggaagccg 


gcggcacctc 


gctaacggat tcaccactcc 


1620 


aagaattgga 


gccaatcaat 


tcttaCQQao 




9^gwaciGicwci cicwcccggca 


J-OoU 


gaacatatcc 


ategcgtccg 


ceatctecag 


cagccgcacg 


cggcgcatcc ccccccccct 


1740 


ttcaattcaa 


ttcateattt 


tttttttatt 


cttttttttg 


atttcggttt ctttgaaatt 


1800 


tttttgattc 


ggtaatctcc 


gaacagaagg 


aagaacgaag 


gaaggagcac agacttagat 


1860 


tggtatatat 


acgcatatgt 


agtgttgaag 


aaacatgaaa 


ttgcccagta ttcttaaecc 


1920 


aactgcacag 


aacaaaaacc 


tgcaggaaac 


gaagataaat 


catgtegaaa gctacatata 


1980 
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aggaacgtgc 


tgctactcat 


cctagtcctg 


etgctgccaa gctatttaat atcatgcacg 


2040 


aaaagcaaac 


aaacttgtgt 


get teat tgg 


atgwucgcac 


caccaaggaa 


ttactggagt 


2100 


tagttgaagc 


attaggtccc 


aaaatttgtt 


ia ^ Si a e> a 


acatgtggat 


atcttgactg 


2160 


atttttccat 


ggagggcaca 


gttaagccgc 




atccgccaag 


tacaattttt 


2220 


tactcttcga 


agacagaaaa 


tttgctgaca 




agtcaaattg 


cagtactctg 


2280 


cgggtgtatia 


cagaatagca 


gaatgggcag 




tgcacacggt 


gtggtgggcc 


2340 


caggtattgt: 


tagcggtttg 


aagcaggcgg 




aacaaaggaa cctagaggcc 


2400 


ttttgatgtt 


agcagaattg 


tcatgcaagg 


^^^^ 0^ ^ A 4* 

gccccccacc 


tactggagaa 


tatactaagg 


2460 


gtactgttga 


cattgcgaag 


agcgacaaag 


actttgtcat 


cggctttatt 


gctcaaagag 


2520 


acatgggtgg 


aagagatgaa 


ggttacgatt 


ggc^gaccac 


gacacccggt 


gtgggtttag 


2580 


atgacaaggg 


agacgcattg 


ggtcaacagt 


auagaaccgc 


ggatgatgtg 


gtctctacag 


2640 


gatctgacat 


tattattgt-t 


ggaagaggac 


(> a, u t. u g caaa 


gggaagggat 


gctaaggtag 


2700 


agggtgaacg 


ttacagaaaa 


gcaggctggg 




gagaagatgc 


ggccagcaaa 


2760 


actaaaaaac 


tgtattataa 


gtaaatgcat 


s ca uac V aaa 


ctcacaaatt 


agagcttcaa 


2820 


ttHaaHtata 


tcagttatta 


ctcgggcgta 


acga ucc cca 


tciatgacgaa 


aaaaaaaaaa 


2880 


ttggaaagaa 


aagggggggg 


gggcagcgtt 


gggtcctggc 


cacgggtgcg catgatcgtg 


2940 


ctcctgtcgt 


tgaggacccg 


gctaggctgg 


cggggttgcc 


ttactggtta 


gcagaatgaa 


3000 


tcaccgatac 


gcgagcgaac 


gtgaagcgac 


tgctgctgca 


aaacgtctgc 


gacctgagca 


3060 


acaacatgaa 


tggtcttcgg 


tttccgtgtt 


tcgtaaagtc 


tggaaacgcg 


gaagtcagcg 


3120 


ccctgcacca 


ttatgttccg 


gatctgcatc 


gcaggatgcb 


gctggctacc 


ctgtggaaca 


3180 


cctacatctg 


tattaacgaa 


gcgctggcat 


tgaccctgag 


tgatttttct 


ctggtcccgc 


3240 


cgcatccata 


ccgccagttg 


tttaccctca 


caacgttcca 


gtaaccgggc 


atgttcatca 


3300 


tcagtaaccc 


gtatcgtgag 


catcctctct 


cgtttcatcg gtatcattac ccccatgaac 


3360 


agaaattccc 


ccttacacgg 


aggcatcaag 


tgaccaaaca 


ggaaaaaacc 


gcccttaaca 


3420 


tggcccgctt 


tatcagugc 


cagacattaa 


cgcttctgga 


gaaactcaac 


gagctggacg 


3480 


cggatgaaca 


ggcagacatc 


tgtgaatcgc 


ttcacgacca 


cgctgatgag 


ctttaccgca 


3540 


gccctcgagg 


gataagcttc 


atttttagat 


aaaatttatt 


aatcatcatt 


aatttcttga 


3600 


aaaacatttt 


atttattgat 


cttttataac 


aaaaaaccct 


tctaaaagtt 


tatttttgaa 


3660 


tgaaaaactt 


ataaaaattt 


atgaaaacta 


caaaaaataa 'aatttttaat 


taaaataatt 


3720 


ttgataagaa 


cttcaatctt 


tgactagcta 


gcttagtcat 


ttttgagatt 


taattaatat 


3780 
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tttacgttta 


ttcatatata 


aactattcaa. 


aatattatag 


aatttaaaca 


ttttaacatc 


3840 


ttaatcattc 


ataaataact 


aaaaatcaaa 


gtattacatc 


aataaataac 


ttttactcaa 


3900 


tgtcaaagaa 


ttattggggt 


^ggggttggg 


gttggggttg 


gggttggggt 


tggggttggg 


3960 


Sttggggttg 


gggttggggt 


tggggttggg 


gttggggttg 


gggttggggt 


tggggttggg 


4020 


gttggggttg 


gggttggggt 


tggggttggg 


gttggggttg 


gggttggggt 


tggggttggg 


4080 


gttggggttg 


gggttggggt 


tggggttggg 


gttggggttg 


gggttggggt 


tggggttggg 


4140 


gttggggttg 


gggttggggt* 


tggggttggg 


gttggggttg 


gggtgggaaa 


acagcattca 


4200 


ggtattagaa 


gaatatcctg 


attcaggtga 


aaatattgtt 


gatgcgcggg atcctcgggg 


4260 


acaccaaata 


tggcgatctc 


ggccttttcg 


tttcttggag 


ctgggacatg 


tttgccatcg 


4320 


atccatctac 


caccagaacg 


gccgttagat 


ctgctgccac 


cgttgtttcc 


accgaagaaa 


4380 


ccaccgttgc 


cgtaaccacc 


acgacggttg 


ttgctaaaga 


agctgccacc 


gccacggcca 


4440 


ccgttgtagc 


cgccgttgtt 


gttattgtag 


ttgctcatgt 


tatttctggc 


acttcttggt 


4500 


tttcctctta 


agtgaggagg 


aacataacca 


ttctcgttgt 


tgtcgttgat 


gcttaaattt * 


4560 


tgcacttgtt 


cgctcagttc 


agccataata 


tgaaatgctt 


ttcttgttgt 


tcttacggaa 


4620 


taccacttgc 


cacctatcac 


cacaactaac 


tttttcccgt 


tcctccatct 


cttttatatt 


4680 


ttttttctcg 


atcgagttca 


agagaaaaaa 


aaagaaaaag 


caaaaagaaa 


aaaggaaagc 


4740 


gcgcctcgtt 


cagaatgaca 


cgtatagaat 


gatgcattac 


cttgtcatct 


tcagtatcat 


4800 


actgttcgta 


tacatactta 


ctgacattca 


taggtataca 


tatatacaca 


tgtatatata 


4860 


tcgtatgctg 


cagctttaaa 


taatcggtgt 


cactacataa 


gaacaccttt 


ggtggaggga 


4920 


acatcgttgg 


taccattggg 


cgaggtggct 


tctcttatgg 


caaccgcaag 


agccttgaac 


4980 


gcactctcac 


tacggtgatg 


atcattcttg 


cctcgcagac 


aatcaacgtg 


gagggtaatt 


5040 


ctgctagcct 


ctgcaaagct 


ttcaagaaaa 


tgcgggatca 


tctcgcaaga 


gagatctcct 


5100 


actttctccc 


tttgcaaacc 


aagttcgaca 


actgcgtacg 


gcctgttcga 


aagatctacc 


5160 


accgctctgg 


aaagtgcctc 


atccaaaggc 


gcaaatcctg 


atccaaacct 


ttttactcca 


5220 


cgcgccagta 


gggcctcttt 


aaaagcttga 


ccgagagcaa 


tcccgcagtc 


ttcagtggtg 


5280 


tgatggtcgt 


ctatgtgtaa 


gtcaccaatg 


cactcaacga 


ttagcgacca 


gccggaatgc 


5340 


ttggccagag 


catgtatcat 


atggtccaga 


aaccctatac 


ctgtgtggac 


gttaatcact 


5400 


tgcgattgtg 


tggcctgttc 


tgctactgct 


tctgcctctt 


tttctgggaa 


gatcgagtgc 


5460 


tctatcgcta 


ggggaccacc 


ctttaaagag 


atcgcaatct 


gaatcttggt 


ttcatttgta 


5520 
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atacgcttta 


ctagggcttt 


ctgctctgtc 


atctttgcct 


tcgtttatct 


tgcctgctca 


5580 


ttttttagta 


tattcttcga 


agaaatcaca 


ttactttata 


taatgtataa 


ttcattatgt 


5640 


gataatgcca 


atcgctaaga 


aaaaaaaaga 


gtcatccgct 


aggtggaaaa 


aaaaaaatga 


5700 


aaatcattac 


cgaggcataa 


aaaaatatag 


agtgtactag 


aggaggccaa 


gagtaataga 


5760 


aaaagaaaat 


tgcgggaaag 


gactgtgtta 


tgacttccct 


gactaatgcc 


gtgttcaaac 


5820 


gatacctggc 


agtgactcct 


agcgctcacc 


aagctcttaa aacgagaatt 


aagaaaaagt 


5880 


cgtcatcttt 


cgataagttt 


ttcccacagc 


aaagcaatag 


tagaaaaaaa 


caatgggaaa 


5940 


cgttgaatga 


agacaaagcg 


tcgtggctta 


aaaggaaata 


cgctcacgta 


catgctaggg 


6000 


aacaggaccg 


tgcagcggat 


cccgcgcatc 


aacaatattt 


tcacctgaat 


caggatattc 


6060 


ttctaatacc 


tgaatgctgt 


ttitcccaccc 


caaccccaac 


cccaacccca 


accccaaccc 


6X20 


caaccccaac 


cccaacccca 


accccaaccc 


caaccccaac 


cccaacccca 


accccaaccc 


6X80 


caaccccaac 


cccaacccca 


accccaaccc 


caaccccaac 


cccaacccca 


accccaaccc 


6240 


caaccccaac 


cccaacccca 


accccaaccc 


caaccccaac 


cccaacccca 


accccaaccc 


6300 


caaccccaac 


cccaacccca 


accccaaccc 


caaccccaac 


cccaacccca 


accccaataa 


6360 


ttctttgaca 


ttgagtaaeia 


gttatttatt 


gatgtaatac 


tttgattttt 


agttatttat 


6420 


gaatgattaa 


gatgttaaaa 


tgtttaaatt 


ctataatatt 


ttgaatagtt 


tatatatgaa 


6480 


taaacataaa 


atattaatta 


aatctcaaaa 


atgactaagc 


tagctagtca 


aagattgaag 


6540 


ttcttatcaa 


aattatttta 


attaaaaatt 


ttattttttg 


tagttttcat 


aaatttttat 


6600 


aagtttttca 


ttcaaaaata 


aacttttaga 


agggtttttt gttataaaag 


atcaataaat 


6660 


aaaatgtttt 


tcaagaaatt 


aatgatgatt 


aataaatttt 


atctaaaaat 


gaagcttatc 


6720 


cctcgagggc 


tgcctcgcgc 


gtttcggtga 


tgacggtgaa 


aacctctgac 


acatgcagct 


6780 


cccggagacg 


gtcacagctt 


gtckgtaagc 


ggatgccggg 


agcagacaag 


cccgtcaggg 


6840 


cgcgtcagcg 


ggtgttggcg 


ggtgtcgggg 


cgcagccatg 


acccagtcac 


gtagcgatag 


6900 


cggagtgtat 


actggcttaa 


ctatgcggca 


tcagagcaga 


ttgtactgag 


agtgcaccat 


6960 


atgcggtgtg 


aaataccgca 


cagatgcgta 


aggagaaaat 


accgcatcag 


gcgctcttcc 


7020 


gcttcctcgc 


tcactgactc 


gctgcgctcg 


gtcgttcggc 


tgcggcgagc 


ggtatcagct 


7080 


cactcaaagg 


cggtaatacg 


gttatccaca 


gaatcagggg ataacgcagg 


aaagaacatg 


7X40 


tgagcaaaag 


gccagcaaaa 


ggccaggaac 


cgtaaaaagg ccgcgbtgct 


gSfcgtttttc 


7200 


cataggctcc 


gcccccctga 


cgagcatcac 


aaaaatcgac gctcaagtca 


gaggtggcga 


7260 


aacccgacag 


gactataaag 


ataccaggcg 


tttccccctg gaagctccct 


cgtgcgctct 


7320 
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cctgttccga ccctgccgct taccggatac ctgtccgcct ttctcccttc gggaagcgtg 7380 

gcgctttctc atagctcacg ctgtaggtat ctcagttcgg tgtaggtcgt tcgctccaag 7440 

ctgggctgtg bgcacgaacc ccccgttcag cccgaccgct gcgccttatc cggtaactat 7500 

cgtcttgagt ccaacccggt aagacacgac ttatcgccac tggcagcagc cactggtaac 7560 

aggattagca gagcgaggta tgtaggcggt gctacagagt tcttgaagtg gtggcctaac 7620 

tacggctaca ctagaaggac agtatttggt atctgcgctc .tgctgaagcc agttaccttc 7680 

ggaaaaagag ttggtagctc ttgatccggc aaacaaacca ccgctggtag cggtggtttt 7740 

tttgtttgca agcagcagat tacgcgcaga aaaaaaggat ctcaagaaga tcctttgatc 7800 

ttttctacgg ggtctgacgc tcagtggaac gaaaactcac gttaagggat tttggtcatg 7860 

agattatcaa aaaggatctt cacctagatc cttttaaatt aaaaatgaag ttttaaatca 7920 

atctaaagta tatatgagta aacttggtct gacagttacc aatgcttaat cagtgaggca 7980* 

cctatctcag cgatctgtct atttcgttca tccatagttg cctgactccc cgtcgtgtag 8040 

ataactacga tacgggaggg cttaccatct ggccccagtg ctgcaatgat accgcgagac 8100 

ccacgctcac cggctccaga tttatcagca ataaaccagc cagccggaag ggccgagcgc 8160 

agaagtggtc ctgcaacttt atccgcctcc atccagtcta ttaattgttg ccgggaagct 8220 

agagtaagta gttcgccagt taatagtttg cgcaacgttg ttgccattgc tgcaggcatc 8280 

gtggtgtcac gctcgtcgtt tggtatggct tcattcagct ccggttccca acgatcaagg 8340 

cgagttacat gatcccccat gttgtgcaaa aaagcggtta gctccttcgg tcctccgatc 8400 

gttgtcagaa gtaagttggc cgcagtgtta tcactcatgg ttatggcagc actgcataat 8460 

tctcttactg tcatgccatc cgtaagatgc ttttctgtga ctggtgagta ctcaaccaag 8520 

tcattctgag aatagtgtat gcggcgaccg agttgctctt gcccggcgtc aacacgggat 8580 

aataccgcgc cacatagcag aactttaaaa gtgctcatca ttggaaaacg ttcttcgggg 8640 

cgaaaactct caaggatctt accgctgttg agatccagtt cgatgtaacc cactcgtgca 8700 

cccaactgat cttcagcatc ttttactttc accagcgttt ctgggtgagc aaaaacagga 8760 

aggcaaaatg ccgcaaaaaa gggaataagg gcgacacgga aatgttgaat actcatactc 8820 

ttcctttttc aatattattg aagcatttat cagggttatt gtctcatgag cggatacata 8880 

tttgaatgta tttagaaaaa taaacaaata ggggttccgc gcacatttcc ccgaaaagtg 8940 

ccacctgacg tctaagaaac cattattatc atgacattaa cctataaaaa taggcgtatc 9000 

acgaggccct ttcgtcttca agaattaatt cggtcgaaaa aagaaaagga gagggccaag 9060 
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ttggtgacta 


ttgagcacgt 


gagtatacgt 


gattaagcac 


acaaaggcag 


9120 


cttggagtat 


gtctgttatt 


aatttcacag 


gtagttctgg 


tccattggtg 


aaagbtbgcg 


9180 


gcttgcagag 


cacagaggcc 


gcagaatgtg 


ctctagattc 


cgatgctgac 


ttgctgggta 


9240 


ttatatgtgt 


gcccaataga 


aagagaacaa 


ttgacccggt 


tatbgcaagg 


aaaabbbcaa 


9300 


gtcttgtaaa 


agcatataaa 


aatagbtcag 


gcactccgaa 


atacttggtb 


ggcgbgbbbc 


9360 


gtaatcaacc 


taaggaggat: 


g^tttggctc 


tggtcaatga 


btacggcatt 


gatatcgtcc 


9420 


aactgcatgg 


agatgagtcg 


tggcaagaat 


accaagagtt 


ccbcggbbbg 


ccagtbabba 


9480 


aaagactcgt 


atttccaaaa 


gactgcaaca 


tactactcag 


tgcagcttca 


cagaaaccbc 


9540 


attcgtttat 


tcccttgttt 


gattcagaag 


caggtgggac 


aggbgaactt 


bbggattgga 


9600 


actcgatttc 


tgactgggtt 


ggaaggcaag 


agagccccga 


aagcbbacab 


tttatgtbag 


9660 


ctggtggact 


gacgccagaa 


aatgttggtg 


atgcgcttag 


attaaabggc 


gttabtggbg 


9720 


ttgatgtaag 


cggaggtgtg 


gagacaaatg 


gtgtaaaaga 


cbcbaacaaa 


abagcaaatb 


9780 


tcgtcaaaaa 


tgctaagaaa 


taggttatta 


ctgagtagta 


bttatttaag 


tatbgtbbgb 


9840 


gcacttgcct 


gcaggccttt 


tgaaaagcaa 


gcataaaaga 


bctaaacata 


aaatctgbaa 


9900 


aataacaaga 


tgtaaagata 


atgctaaatc 


atbtggcttt 


btgabtgabb 


gbacaggaaa 


9960 


atatacatcg 


cagggggttg 


acttttacca 


tttcaccgca 


abggaabcaa 


act:tgtbgaa 


10020 


gagaatgttc 


acaggcgcat 


acgctacaat 


gacccgattc 


btgcbagcct 


tbtcbcggtc 


10080 


ttgcaaacaa 


ccgccggcag 


cttagtatat 


aaatacacat 


gtacabaccb 


cbctccgbab 


10140 


cctcgtaatc 


attttcttgt 


atttatcgtc 


ttttcgctgt 


aaaaacbbba 


bcacacbtab 


10200 


ctcaaataca 


cttattaacc 


gcttttacta 


btatcttcta 


cgctgacagb 


aatabcaaac 


10260 


agtgacacat 


attaaacaca 


gtggtttctt 


tgcataaaca 


ccatcagccb 


caagtcgbpa 


10320 


agtaaagatt 


tcgtgttcat 


gcagatagat 


aacaatctat 


abgbbgabaa 


bbagcgbbgc 


10380 


ctcatcaatg 


cgagatccgt 


ttaaccggac 


cct:agtgcac 


btaccccacg 


tbqggbccac 


10440 


tgtgtgccga 


acatgctcct 


tcactatttt 


aacabgtgga 


abtaatbcta 


aatcctcbbt 


10500 


atatgatctg 


ccgatagata 


gttctaagtc 


attgaggttc 


abcaacaatb 


ggabbbbctg 


10560 


tttactcgac 


ttcaggtaaa 


tgaaatgaga 


tgatacttgc 


tbatctcata 


gbtaacbcba 


10620 


agaggtgata 


cttatttact 


gtaaaactgt 


gacgataaaa 


ccggaaggaa 


gaabaagaaa 


10680 


actcgaactg 


atctataatg 


cctattttct 


gtaaagagtt 


baagctabga 


aagcctcggc 


10740 


attttggccg' 


ctcctaggta 


gtgctttttt 


tccaaggaca 


aaacagbttc 


tbtbbcbbga 


10800 


gcaggtttta 


tgtttcggta 


atcatciaaca 


ataaataaat 


tatttcabt'b 


abgbbtaaaa 


10860 




ataaaaaata 


aaaaagtatt 


ttaaattttt 


aaaaaagttg 


attataagca 


tgtgaccttt 


10920 


tgcaagcaat 


taaattttgc 


aatttgtgat 


tttaggcaaa 


agttacaatt 


tctggctcgt 


10980 


gtaatatatg 


tatgctaaag 


tgaactttta 


caeiagtcgat 


atggactbag 


tcaaaagaaa 


11040 


ttttcttaaa 


aatatatagc 


actagccaat 


ttagcacttc 


tttatgagat 


atattataga 


11100 


ctttattaag 


ccagatttgt 


gtattatatg 


tatttacccg 


gcgaatcatg 


gacatacatt 


11160 


ctgaaatagg 


taatattctc 


tatggtgaga 


cagcatagat 


aacctaggat 


acaagttaaa 


11220 


agctagtact 


gttttgcagt 


aatttttttc 


ttttttataa 


gaatgttacc 


acctaaataa 


11280 


gttataaagt 


caatagttaa 


gtttgatatt 


tgabtgtaaa ataccgtaat 


atatttgcat 


11340 


gatcaaaagg 


ctcaatgttg 


actagccagc atgtcaacca ctatattgat 


caccgatata 


11400 


tggacttcca 


caccaactag 


taatatgaca 


ataaattcaa 


gatattcttc 


atgagaatgg 


11460 


cccaga 
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. <223> ColEl 



<4llU> b 

ctgatttgcc 


cgggcagttc 


aggctcatca 


ggcgcgccat 


gcagggatcg 


gcgttttccg 


60 


gaactggaaa 


accgacatgt 


tgatttcctg 


aaacgggata 


tcatcaaagc 


catgaacaaa 


120 


gcagccgcgc 


tggatgaact 


gataccgggg 


ttgctgagtg 


aatatatcga 


acagtcaggt 


180 


taacaggctg 


cggcattttg 


tccgcgccgg 


gcttcgctca 


ctgttcaggc 


cggagccaca 


240 


gaccgccgtt 


gaatgggcgg 


atgctaatta 


ctatctcccg 


aaagaatccg 


cataccagga 


300 


agggcgctgg 


gaaacactgc cctttcagcg 


ggccatcatg 


aatgcgatgg 


gcagcgacta 


360 


catccgtgag 


gtgaatgtgg 


tgaagtctgc 


ccgtgtcggt 


tattccaaaa 


tgctgctggg 


420 


tgtttatgcc 


tactttatag 


agcataagca 


gcgcaacacc 


cttatctggt 


tgccgacgga 


480 


tggtgatgcc 


gagaacttta 


tgaaaaccca 


cgttgagccg 


actattcgtg 


atattccgtc 


540 


gctgctggcg 


ctggccccgt 


ggtatggcaa 


aaagcaccgg 


gataacacgc 


tcaccatgaa 


600 


gcgtttcact 


aatgggcgtg gcttctggtg 


cctgggcggt 


aaagcggaga 


tcttcttcgg 


660 


atgcaagggt 


tcgaatccct 


t age tc teat 


tattttttgc 


tttttctctt 


gaggtcacat 


720 


gatcgcaaaa 


tggcaaatgg 


cacgtgaagc 


tgtcgatatt 


ggggaactgt 


TOtggttggc 


780 


aaatgactaa 


ttaagttagt 


caaggcgcca 


tcctcabgaa 


aactgtgtaa cataataacc 


840 


gaagtgtcga 


aaaggtggca 


ccttgtccaa 


ttgaacacgc 


tcgatgaaaa aaataagata 


900 


tatataaggt 


taagtaaagc gtctgttaga 


aaggaagttt 


ttcctttttc 


ttgctctctt 


960 


gtcttttcat 


ctactatttc 


cttcgtgtaa 


tacagggtcg 


tcagatacat 


agatacaatt 


1020 


ctattacccc 


catccataca 


agcttggcgc 


cgaattcgtc 


gacccgggga 


tccgcggccg 


1080 


caggcctaaa 


ttgatctaga gctttggact 


tcttcgccag 


aggtttggtc 


aagtctccaa 


1140 


tcaaggttgt 


cggcttgtct 


accttgccag 


aaatttacga 


aaagatggaa 


aagggtcaaa 


1200 


tcgttggtag 


atacgttgtt gacacttcta 


aataagcgaa 


tttcttatga 


tttatgattt 


1260 


ttattatbaa 


ataagttata aaaaaaataa 


gtgtatacaa 


attttaaagt 


gactcttagg 


1320 


ttttaaaacg 


aaaattcttg 


tbcttgagta 


actctttcct 


gtaggtcagg ttgctttctc 


1380 


aggtatagca 


tgaggtcgct 


cbtattgacc 


acacctctac 


cggcatgccc 


atggatgacc 


1440 


cctccagcgt 


gttttatctc 


tgcgagcata 


atgcctgcgt 


catccgccag caggagctgg 


1500 


ac.tttactga 


tgcccgttat 


atctgcgaaa 


agaccgggat 


ctggacccgt 


gatggcattc 


1560 


tctggttttc 


gtcatccggt 


gaagagattg 


agccacctga 


cagtgtgacc 


tttcacatct 


1620 


ggacagcgta 


cagcccgttc 


accacctggg 


tgcagattgt 


caaagactgg atgaaaacga 


1680 
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aaggggatac gggaaaacgt aaaaccttcg taaacaccac gctcggtgag atgatcaatg 1740 

catcctgcat ggcgcgcctg atgagcctga actgcccggg caaatcagct ggacgtctgc 1800 

ctgcattaat gaatcggcca acgcgcgggg agaggcggtt tgcgtattgg gcgctcttcc 1860 

gcttcctcgc tcactgactc gctgcgctcg gtcgttcggc tgcggcgagc ggtatcagct 1920 

cactcaaagg cggtaatacg gttatccaca gaatcagggg ataacgcagg aaagaacatg 1980 

tgagcaaaag gccagcaaaa ggccaggaac cgtaaaaagg ccgcgttgct ggcgtttttc 2040 

cataggctcc gcccccctga cgagcatcac aaaaatcgac gctcaagtca gaggtggcga 2100 

aacccgacag gactafcaaag ataccaggcg tttccccctg gaagctccct cgtgcgctct 2160 

.cctgttccga ccctgccgct taccggatac ctgtccgcct ttctcccttc gggaagcgtg 2220 

gcgctttctc atagctcacg ctgtaggtat ctcagttcgg tgtaggtcgf tcgctccaag 2280 

ctgggctgtg tgcacgaacc ccccgttcag cccgaccgct gcgccttatc cggtaactat 2340 

cgtcttgagt ccaacccggt aagacacgac ttatcgccac tggcagcagc cactggtaac 2400 

aggattagca gagcgaggta tgtaggcggt gctacagagt tcttgaagtg gtggcctaac 2460 

tacggctaca ctagaaggac agtatttggt atctgcgctc tgctgaagcc agttaccttc 2520 

ggaaaaagag ttggtagctc ttgatccggc aaacaaacca ccgctggtag cggtggtttt 2580 

tttgtttgca agcagcagat tacgcgcaga aaaaaaggat ctcaagaaga tcctttgatc 2640 

ttttctacgg ggtctgacgc tcagtggaac gaaaactcac gttaagggat tttggtcatg 2700 

agattatcaa aaaggatctt cacctagatc cttttaaatt aaaaatgaag ttttaaatca 2760 

atctaaagta tatatgagta aacttggtct gacagttacc aatgcttaat cagtgaggca 2820 

cctatctcag cgatctgtct atttcgttca tccatagttg cctgactccc cgtcgtgtag 2880 

ataactacga tacgggaggg cttaccatct ggccccagtg ctgcaatgat accgcgagac 2940 

ccacgctcac cggctccaga tttatcagca ataaaccagc cagccggaag ggccgagcgc 3000 

agaagtggtc ctgcaacttt atccgcctcc atccagtcta ttaattgttg ccgggaagct 3060 

agagtaagta gttcgccagt taatagtttg cgcaacgttg ttgccattgc tacaggcatc 3120 

gtggtgtcac gctcgtcgtt tggtatggct tcattcagct ccggttccca acgatcaagg 3180 

cgagttacat gatcccccat gttgtgcaaa aaagcggtta gctccttcgg tccfeccgatc 3240 

gttgtcagaa gtaagttggc cgcagtgtta tcactcatgg ttatggcagc actgcataat 3300 

tctcttactg tcatgccatc cgtaagatgc ttttctgtga ctggtgagta ctcaaccaag 3360 

tcattctgag aatagtgtat gcggcgaccg agttgctctt gcccggcgtc aatacgggat 3420 
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aataccgcgc cacatagcag 


aactttaaaa gtgctcatca 


ttggaaaacg 


ttcttcgggg 


3480 


cgaaaactct caaggatctt 


accgctgttg agatccagtt 


cgatgtaacc 


cactcgtgca 


3540 


cccaactgat cttcagcatc 


ttttactttc accagcgttt 


ctgggtgagc 


aaaaacagga 


3600 


aggcaaaatg ccgcaaaaaa 


gggaataagg gcgacacgga 


aatgttgaat 


actcatactc 


3660 


t:tcctttttc aatattattg 


aagcatttat cagggttatt 


gtctcatgag 


cggatacata 


3720 


tttgaatgta tttagaaaaa 


taaacaaata ggggttccgc 


gcacatttcc 


ccguaagtg 


3780 


ccacctgacg cgccctgtag 


cggcgcatta agcgcggcgg 


gtgtggtggt 


tacgcgcagc 


3840 


gtgaccgcta cacttgccag 


cgccctagcg cccgctcctt 


tcgctttctt 


cccttccttt 


3900 


ctcgccacgt tcgccggctt 


tccccgtcaa gctctaaatc 


gggggctccc 


tttagggttc 


3960 


- cgatttagtg ctttacggca 


cctcgacccc aaaaaacttg 


attagggtga 


tggttcacgt* 


• 4020 


agtgggccat cgccctgata 


gacggttttt cgccctttga 


cgttggagtc 


cacgttcttt 


4080 


aatagtggac tcttgttcca 


aactggaaca acactcaacc 


ctatctcggt 


ctattctttt 


4140 


gatttataag ggattttgcc 


gatttcggcc tattggttaa 


aaaatgagct 


gatttaacaa 


4200 


aeiatttaacg cgaattttaa 


caaaatatta acgcttacaa 


ttbccattcg 


ccatkcaggc 


4260 


tgcgcaactg ttgggaaggg 


cgatcggtgc gggcctcttc 


gctattacgc 


cag 


4313 
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